Creating Word Embeddings
with Brian Spiering
Word embedding is a set of relatively new techniques in natural language processing that have proven to be very useful and popular for making sense of textual data.
Using neural networks, mappings can be created between words (strings) and vectors (arrays of floats) which can then be used to boost the performance of tasks such as sentiment analysis.This workshop will give a technical introduction to several word embedding algorithms: word2vec, GloVe, and doc2vec. After developing an understanding of the algorihms, you will have a chance to try them out on different text datasets.
Dr. Brian Spiering is a Professor of Computer Science at USF. He teaches humans the languages of computers (primarily Python) and teaches computers the languages of humans (through natural language processing and artificial intelligence). He is active in the San Francisco tech community as a volunteer and mentor at DataKind SF Bay and Delta Analytics.
About the Workshop
“You shall know a word by the company it keeps,” is a common refrain in natural language processing. Word embedding are trained through a simple neural network in order to learn which words tend to occur together, and embeds the words in a meaningful real-valued vector space. From these word embeddings it is possible to compare words with distance measures, add/subtract words to explore relationships between concepts, and use clustering to find semantically related words.
Creating word embeddings is often the first step in a machine learning pipeline for text data. Actually, word embedding algorithms are general purpose algorithms that allows any sequential data to be encoded as meaningful vectors—including emoji! 💥
Participants are expected to be confident programmers (recommended at least 2 years professional experience or equivalent). The exercises will be in Python. No math background is assumed beyond high school level, although some familiarity with linear algebra is advantageous (you may enjoy watching some of Grant Sanderson’s wonderful Essence of Linear Algebra videos beforehand).