Methods and Applications: Corpus Linguistics

Teachers: Victoria Kamasa & Katarzyna Klessa (vkamasa | klessa @amu.edu.pl)

Students’ page (more materials)

  • Topic 01: (2017-10-04) Introduction: basic concepts: corpus, corpus linguistics, main types of corpora. View Slides.
  • Topic 02: (2017-10-11) Ctd: the types of corpora, selected concepts used in everyday practice of corpus linguistics. View Slides.
  • Topic 03Referential corpora: compare and contrast.  Please sign in to the Corpus linguistics course in Moodle. Use your USOS credentials to log in. More instructions about Moodle can be found in the Slides from Class 2. Your tasks for this part are in the section for week: 18 October – 24 October.
  • Topic 04Specialized corpora. Please find the class materials in the Corpus linguistics course in Moodle (week: 25 October – 31 October).
  • Topic 05: (2017-11-08) Corpus annotation. View Slides.
  • Topic 06: Speech corpora in phonetic research. Annotation of prosodic features in the PoInt Corpus. Please find the class materials in the Corpus linguistics course in Moodle (week: 15 November – 21 November).
  • Topic 07-08: Annotation practice: intonation labelling (tasks defined as workshop for week: 29 November – 5 December Moodle, scheduled time for the exercises is more than 1 week).
  • Topic 09: Text corpora. Annotation & corpus mining. AntConc as an example of a corpus analysis tool – intro & revision.
  • Topic 10: AntConc practice (Moodle).
  • Topic 11: CLARINCLARIN-PL: tools and resources for text & speech corpus processing, part 1, Slides.
  • Topic 12: CLARIN-PL: tools and resources for text & speech corpus processing, part 2, Slides.
  • Topic 13: Exercises (CLARIN-PL speech tools)
  • Topic 14: Revision & summary.

References:

  • Anthony, L. (2005). AntConc: A Learner and Classroom Friendly, Multi-Platform Corpus Analysis Toolkit. Proceedings of IWLeL 2004: An Interactive Workshop on Language e-Learning, pp. 7-13, see also: http://www.laurenceanthony.net/software.html
  • Baker, P. (Ed.). (2009). Contemporary corpus linguistics. London, New York: Continuum.
  • Baker, P., Hardie, A., & McEnery, T. (2006). A glossary of corpus linguistics. Edinburgh: Edinburgh University Press.
  • Bird, S., E. Klein, and E. Loper, Natural Language Processing with Python. Analyzing Text with the Natural Language Toolkit, see: http://www.nltk.org/book/ ,  http://www.nltk.org/ and (for beginners) Python course at: https://www.codecademy.com/learn/learn-python
  • Boersma, P. & Weenink, D. (2013). Praat: doing phonetics by computer [Computer program]. Ver. 5.3.51, retrieved 2.06.2013 from www.praat.org
  • CLARIN-PL: https://clarin-pl.eu
  • CLARIN: https://www.clarin.eu/
  • Dimitriadis, A., & Musgrave, S. (2009). Designing linguistic databases: A primer for linguists (p. 13). Berlin: Walter de Gruyter.
  • Klessa, K. (2015). Annotation Pro [Software tool]. Ver. 2.2.4.0. Retrieved from: annotationpro.org   on 2015-05-19.
  • Klessa, K., Karpiński, M., Wagner, A. (2013). Annotation Pro – a new software tool for annotation of linguistic and paralinguistic features. In D. Hirst & B. Bigi (Eds.)  Proceedings of the Tools and Resources for the Analysis of Speech Prosody (TRASP) Workshop, Aix en Provence, 51-54.
  • Barbara Konat’s website (cf. the video presented during the class on different approaches and uses of annotation of linguistic data): http://bkonat.pl/
  • Koržinek, D., Marasek, K., Brocki, Ł., & Wołk, K. (2017). Polish Read Speech Corpus for Speech Tools and Services. arXiv preprint arXiv:1706.00245.
  • McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge textbooks in linguistics. Cambridge, New York: Cambridge University Press.
  • McEnery, T., & Wilson, A. (2001). Corpus linguistics: An introduction (2nd ed.). Edinburgh textbooks in empirical linguistics. Edinburgh: Edinburgh University Press.
  • Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database. International journal of lexicography3(4), 235-244, download.
  • Sloetjes, H., & Wittenburg, P. (2008). Annotation by category – ELAN and ISO DCR. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), on-line: https://tla.mpi.nl/tools/tla-tools/elan/
  • Viana, V., Zyngier, S., & Barnbrook, G. (Eds.). (2011). Studies in corpus linguistics: v. 48. Perspectives on corpus linguistics. Amsterdam, Philadelphia: J. Benjamins Pub.
  • Warren Tang. (2011). A Simple Guide to Using Antconc. Retrieved from http://www.laurenceanthony.net/software/antconc/resources/help_AntConc321_english.pdf 
  • Zaśko-Zielińska, M., Piasecki, M., & Szpakowicz, S. (2015). A large wordnet-based sentiment lexicon for Polish. In Proceedings of the International Conference Recent Advances in Natural Language Processing (pp. 721-730), download.