My original plan for this website was to keep track of thoughts and ideas as I navigate my doctoral program, which is deeply connected to STEM equity and EdTech-related issues. Often, new ideas or potential research paths emerge every few days, only to be forgotten months later—much like the inception of this website. By documenting these insights at the end of each day, I hope to create a roadmap for my future self and anyone interested in this intersection of fields.
Most of these posts will delve into the utilization of Natural Language Processing (NLP) for qualitative research in education. Over the past year, I’ve been drawn into emergent qualitative methods, particularly those involving NLP. This journey began several years ago with proprietary software like LIWC-22, which analyzes text on a psychological level, and has evolved into hands-on programming with Python.
I’ve been exploring various Python packages:
- NLTK: A leading platform for building Python programs to work with human language data.
- spaCy: An open-source library for advanced NLP tasks.
- gensim: Useful for unsupervised topic modeling and natural language processing.
- BERTopic and SentenceBERT: Modern tools for topic modeling and semantic similarity that leverage transformer models.
Currently, I’m fixated on building a Jupyter Notebook qualitative research pipeline. This pipeline starts with Whisper, an automatic speech recognition system, to transcribe audio and video files into text, outputting a CSV file. This file becomes the foundation for further text analytics and machine learning processes such as:
- Sentiment Analysis: Determining the emotional tone behind the words.
- Topic Modeling: Uncovering hidden structures in text data.
- Semantic Similarity: Measuring the likeness between sentences or documents.
By integrating these tools, my hope is to streamline the research process and contribute meaningful insights into educational practices and policies. This approach is rooted in the idea of computational grounded theory, as discussed by Laura K. Nelson*. Computational grounded theory combines traditional qualitative methods with computational techniques, allowing researchers to analyze large volumes of textual data more efficiently while still maintaining the depth of qualitative analysis.
*Nelson, L. K. (2020). Computational grounded theory: A methodological framework. Sociological Methods & Research, 49(1), 3-42.