Skip to content

Data-Science-for-Linguists-2023/Morpheme-Acquisition-Analysis

Repository files navigation

Morpheme-Acquisition-Analysis

A term project by Sen Sub Laban (sen.s@pitt.edu).

Completed Feb. 10, 2023 through Apr. 30, 2023.

Project guestbook

Summary

This project investigates the morpheme acquisition order of native English speakers and English language learners. Based on 'natural order studies,' which are influential in the fields of linguistics and second langauge acquisition, the project takes a data-science approach to analyzing the developmental patterns of grammatical morphemes in language acquisition.

Data

The data sources utilized in this project are sourced from TalkBank. I focused on two corpora:

  1. CHILDES Frogs English Slobin Corpus (Native speakers)

  2. Vercellotti Corpus (Language learners)

Directory

  • Final report provides a complete overview of this project and its outcomes.

  • notebooks contains the Jupyter Notebook files:

  • data_samples contains stored csv files of the data frames utilized throughout the project.

    • visuals is the folder containing all visualizations generated throughout this project.
  • Final_Presentation.pdf is a copy of the presentation aid utilized for a presentation about this project on Apr. 14, 2023.

  • Progress report details progress made on this project throughout the semester.

  • Project plan was the original plan developed at the start of this project.

  • See the License to understand what you may and may not do with this project.

References

Brown, R. (1973). A first language. Cambridge, MA: Harvard University Press.

Jeong, D. B. (2002). Second language acquisition in childhood. Seoul, Korea: Kyungjin Publishing Co.

Juffs, A., Han, N-R., & Naismith, B. (2020). The University of Pittsburgh English Language Corpus (PELIC) [Data set]. http://doi.org/10.5281/zenodo.3991977

Krashen, S. D. (1985). The Input Hypothesis: Issues and implications. New York: Longman.

Lee, Jackson L., Ross Burkholder, Gallagher B. Flinn, and Emily R. Coppess. 2016. Working with CHAT transcripts in Python. Technical report TR-2016-02, Department of Computer Science, University of Chicago.

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.

R. A. Berman & D. I. Slobin (1994). Relating events in narrative: A crosslinguistic developmental study. Hillsdale, NJ: Lawrence Erlbaum Associates.

Vercellotti, M. L. (2017). The development of complexity, accuracy, and fluency in second language performance: A longitudinal study. Applied Linguistics, 38(1), 90-111.

Webber, W., Moffat, A. & Zobel, J. (2010). A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28, 4, Article 20. https://doi.org/10.1145/1852102.1852106

Releases

No releases published

Packages

No packages published