Creating Word Embeddings
with Brian Spiering

Word embedding is a set of relatively new techniques in natural language processing that have proven to be very useful and popular for making sense of textual data.

Using neural networks, mappings can be created between words (strings) and vectors (arrays of floats) which can then be used to boost the performance of tasks such as sentiment analysis.

This workshop will give a technical introduction to several word embedding algorithms: word2vec, GloVe, and doc2vec. After developing an understanding of the algorihms, you will have a chance to try them out on different text datasets.

About Brian

Dr. Brian Spiering is a Professor of Computer Science at USF. He teaches humans the languages of computers (primarily Python) and teaches computers the languages of humans (through natural language processing and artificial intelligence). He is active in the San Francisco tech community as a volunteer and mentor at DataKind SF Bay and Delta Analytics.

About the Workshop

“You shall know a word by the company it keeps,” is a common refrain in natural language processing. Word embedding are trained through a simple neural network in order to learn which words tend to occur together, and embeds the words in a meaningful real-valued vector space. From these word embeddings it is possible to compare words with distance measures, add/subtract words to explore relationships between concepts, and use clustering to find semantically related words.

Creating word embeddings is often the first step in a machine learning pipeline for text data. Actually, word embedding algorithms are general purpose algorithms that allows any sequential data to be encoded as meaningful vectors—including emoji! 💥

Participants are expected to be confident programmers (recommended at least 2 years professional experience or equivalent). The exercises will be in Python. No math background is assumed beyond high school level, although some familiarity with linear algebra is advantageous (you may enjoy watching some of Grant Sanderson’s wonderful Essence of Linear Algebra videos beforehand).

Date, Location and Tickets

This workshop will run 2pm-5pm on Saturday, the 3rd of March 2018, at Bradfield HQ, 576 Natoma Street, San Francisco. The cost of attendance is a tax deductible donation of your choice (recommended amount $100, but please be generous if you can) to Delta Analytics, a nonprofit that provides data science services and training across the globe. Make a donation then forward your receipt to [email protected] to confirm your place!


[email protected]
576 Natoma St
San Francisco, California
© 2016 Bradfield School of Computer Science