Menu

My Resume

Work Experience

Language Instructor

Current
2025 - Present

Teaching language courses and developing curriculum materials.

Teaching Curriculum Design

Freelance Translator & Researcher

Current
2015 - Present

Professional translation services and academic research in linguistics.

Translation Research Linguistics

Language Instructor

Sep 2024 - Nov 2024

Teaching language courses and developing curriculum materials.

Teaching Curriculum Design

Lexicographer

2013 - 2015

Contributed to comprehensive Persian dictionary projects.

Lexicography Dictionary

Copy Writer

2010 - 2013

Created compelling content for various media and publications.

Writing Content Creation

Education

Ph.D in Linguistics

Doctorate
Tarbiat Modares University
Sep 2019 - May 2025
Thesis

Facing Persian in the Wild: Corpus Compilation, Annotation, Tag Encoding, and Processing of Digital Writing Register

The proliferation of digital communication technologies has profoundly reshaped linguistic landscapes, giving rise to a distinct digital writing register that now constitutes a substantial component of daily interaction. Unlike traditional written registers governed by established prescriptive norms, the emergent register is characterized by a lack of central regulatory authority, positioning it as a 'wild' register within linguistic inquiry. This characteristic necessitates a critical re-evaluation of conventional methodologies for linguistic analysis and processing. This study proposes and applies a novel methodological framework designed specifically for engaging with this complex linguistic register. We revisit the processes of corpus compilation, preprocessing, encoding, and annotation, aiming to streamline fundamental processing tasks while rigorously preserving the inherent linguistic and orthographic diversity characteristic of digital writing. The proposed approach deliberately maintains variations during initial preprocessing stages, deferring the crucial task of word identity disambiguation to subsequent, more context-aware processing steps. Our methodology commenced with the extraction of a large-scale corpus, designated '1400', comprising one million random tweets sourced from the Persian calendar year 1400. A one-million- token sub-corpus was subsequently isolated to serve as the focus for detailed preprocessing, part- of-speech tagging, and lemmatization. During the normalization and tokenization phases, we implemented two primary strategies to enhance processing simplicity: the systematic removal of elements that introduce conventional orthographic variation (such as the zero-width non-joiner) and the strategic segmentation of multi-component elements possessing independent grammatical identities. For corpus annotation, we introduce 'Tagframe', a novel and flexible framework for tag encoding. The framework is engineered to facilitate the granular representation of linguistic, paralinguistic, and intra-lexical information (such as clitic structures). Leveraging Tagframe, we developed a standardized tagframe specifically tailored for Persian digital text. This involved a necessary revision of certain traditional grammatical concepts and the inclusion of new categories and layered features representing linguistic, internal structure, and paralinguistic characteristics. Lemmatization was performed utilizing diacritization to resolve homographic ambiguity. The final corpus annotation employed 25 linguistic categories supplemented by 82 linguistic features across three distinct layers: linguistic, intra-lexical, and paralinguistic. Statistical analysis of the annotated corpus, including comparison with other corpora, revealed a notably high frequency of hapax legomena, indicative of the corpus's rich diversity, and a high frequency of verbs, suggesting shared contextual characteristics with spoken language registers. The comprehensive methodology and the resulting annotated corpus presented herein establish a foundational resource and a replicable workflow that can significantly facilitate the creation and processing of similar corpora, thereby enabling further linguistic and computational investigations into this and related language registers.

Master's Degree in Linguistics

Master's
Tarbiat Modares University
Sep 2013 - Jun 2016
Thesis

Inflection of Tense in Persian Verbs and How It Is Processed: A Psycholinguistics Perspective

In this research, we studied the Persian inflectional tense forms and how they are processed. Since historical studies are dominant in the study of language in Persian, we dedicated the first two questions of this research to revise some of the concepts in the area. In the first question, we asked about the categorization of Persian verbs based on the concept of regularity. The results showed that there are three categories in Persian verbs: regular, irregular, and alternative. Regulars are an open category in which their past tenses are produced by adding the regular suffix (: -id) to root and don’t belong to the other categories. Irregulars are a close category which there are no systematic relation between their roots and past tense forms, and they are represented as a list of connected pairs; alternatives, which are a kind of regular verb, are an open category which doesn’t belong to irregulars, their roots consist of more than two syllables, end with -ɑn, and they use -d and -id suffixes to produce their past tense forms. In the second question, we asked about the model which represents the process of tense inflection. The results showed that the present tense of regulars and irregulars, in addition to the past tense of most of the regulars, are produced by retrieving the root and adding proper affixes. On the other hand, the past tense of irregulars and some of the high-frequency regulars are produced by retrieving the past stem from memory. In the third question of this research, we asked about the mental mechanisms that are involved in producing Persian verbs. Steven Pinker claims that the distinction between regulars and irregulars roots in two distinct mechanisms in which rule mechanism involves in producing the regulars and rote mechanism involves in producing irregulars. The results of this research confirm Pinker's claim. To examine these three questions we used a wide range of evidence, but the main source of the evidence came from an experiment in which we presented participants by the present tense of verbs and asked them to produce the past tense of these verbs as fast as possible. Then we gathered the errors that they made and used them as a source for our purposes.

Bachelor's Degree in Linguistics

Bachelor's
Shiraz University
Sep 2009 - Jun 2013

Publications

Encyclopedia Entries

2018
Encyclopedia of Persian Language and Literature (vol. VI)

Vafa Zavvarei

Academy of Persian Literature and Language Press

Dictionaries

2017
The Comprehensive Dictionary of Persian Language (vol. II)

Academy of Persian Literature and Language Press

Projects

CorpusLab

Active

CorpusLab is an online tool for statistical analysis of Persian corpora.

Corpus NLP Persian

PT1400

A corpus containing one million Persian tweets from 1400 (Shamsi Hijri calendar). Includes 50,000 manually normalized, tagged, and lemmatized tweets using the innovative Tagframe encoding method.

Corpus NLP Persian

Psycholinguistic Study of Verb Inflection

Studying how regular and irregular verbs are produced and perceived, examining rule and rote mechanisms in verb inflection processing.

Psycholinguistics Morphology

CPVI

Comprehensive Persian Verb Inflector using Dual Mechanism theory (Words & Rules theory) to inflect Persian verbs.

Python NLP Tool

Phonotactics & Syllabification

Comprehensive study of phonotactics, phonostatistics, and syllabification of approximately 55,000 Persian words.

Phonology Analysis

Enhanced Flexicon

Enhanced version of Flexicon corpus with IPA transcription, syllabification, initial glottal additions, and various improvements.

Corpus Lexicon

Certificates

Click on each category to expand

Introduction to TensorFlow for AI, ML, and Deep Learning
DeepLearning.AI
Natural Language Processing in TensorFlow
DeepLearning.AI
NLP with Classification and Vector Spaces
DeepLearning.AI
Fine Tune BERT for Text Classification
Coursera
Introduction to NLP in Python
Coursera
Transfer Learning for NLP with TensorFlow Hub
Coursera
Named Entity Recognition using LSTMs with Keras
Coursera
Fake News Detection with Machine Learning
Coursera
Convolutions for Text Classification with Keras
Coursera
Tweet Emotion Recognition with TensorFlow
Coursera
NLP: Twitter Sentiment Analysis
Coursera
Basic Sentiment Analysis with TensorFlow
Coursera
The Data Scientist's Toolbox
Johns Hopkins University

Introduction to Git and GitHub
Google
Crash Course on Python
Google
Troubleshooting and Debugging Techniques
Google
Using Python to Interact with the Operating System
Google
Python for Everybody (Specialization)
University of Michigan
Python 3 Programming (Specialization)
University of Michigan
Introduction to HTML5
University of Michigan
Introduction to CSS3
University of Michigan
Introduction to Bash Shell Scripting
Coursera
Automation Scripts Using Bash
Coursera

Basic Statistics
University of Amsterdam
Quantitative Methods
University of Amsterdam
Data Science Math Skills
Duke University

The Bilingual Brain
University of Houston
Fundamental Neuroscience for Neuroimaging
Johns Hopkins University
Introduction to Psychology
University of Toronto
Big Data and Language 1
KAIST
Philosophy of Cognitive Sciences
University of Edinburgh