So I’m also taking that new MOOC that Carol Goodey is talking about. They do an excellent job of balancing the rigour of the syllabus with the affability of the lecturer. The course is designed so as to be generous in the amount of material it provides, but forgiving of those who are not industrious enough to complete it all. However, I do have a question. I know this is a naive, come-in-late-without-having-done-the-reading thing to ask, but here goes.
I did some very ad hoc, unscientific corpus work with my students a few years ago. To help them see how a vocabulary word was used in a natural context, I asked them to enter the word as a search term on GoogleNews. They would of course find examples of the word in journalistic sentences. It worked really well on several levels. The students were interested to see how their words could be relevant to current events; they also picked up a wealth of topical vocabulary and cultural references.
The problem was that in many cases, the word was being used wrong. Sometimes the errors occurred in English language newspapers published in non-English-speaking countries. Sometimes they showed up in less prestigious papers from English language countries. Now I know I’m landing myself in the middle of an Eil , descriptive/prescriptive debate here, but unless we go totally passive and descriptive and just let the multiple versions of languages wash over us, this does pose a problem. If we are to teach that there is a right answer and a wrong answer, or even that one answer is better than another in a certain situation, we need more support than that.
So we could limit our corpus to, say, British and North American sources. That would get rid of the complication of World English idiosyncrasies. But would our language lose some of its richness and diversity? Some of the most celebrated writers of English alive today come from countries whose main language is not English. Is it not perhaps racist to imply that the language spoken there is inferior?
What of the lower level writing in a predominantly English speaking country? How do we draw the line between good newspapers and tabloids when the lines are so blurred. Can we really say that will use data from, for instance, the Guardian, and not the Daily Mail? And couldn’t these distinctions be seen as classist?
My point is that quantitative data alone may not be sufficient to determine English usage, especially in a way that would be useful to English learners. If the scope of the corpus is too broad, we risk confusing the students with non-standard forms. If it is too narrow, it will be too much a reflection of the prejudices of the compiler. If the corpus is to support my teaching, I will have to pick and choose which elements to use. Does that not then defeat the purpose of the whole exercise?
The use of corpora can be refreshingly objective and democratic. At the same time, though, this practice may force us to confront the limits of our linguistic tolerance. Perhaps every language user reaches the point of saying, “This is the right way to say it; this is not.” In corpora, however, the decision is reached on the basis of quantity rather than quality. I’m still not convinced that sheer numbers are the right way to determine appropriate language usage.
And then there’s the question of cliches, but I think I need a whole new post for that.