One hot encoding

One Hot Encoding
- Python for Integrated Circuits -
- An Online Book -

Python for Integrated Circuits http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

In natural language processing, we have to make the words understandable for computers. There are several ways to do this, for instance, one hot encoding. In this one hot encoding technique, we create a one hot vector (a vector which has only one 1 value, also others have to be 0) which has a length number of the words we have in our vocab. As an example, for a vocab like {"This","is","Yougui","Liao"}, then each vector for a word will 4D. The vector of "This" is [1,0,0,0], and so on. However, if we compute the distance between "This" and "is", and "is" and "Yougui", we can see the distances are same and thus we could not protect the real relationships between the words. Fortunately, in Word Embeddings, each element of vector is a different number. For instance, for the same vocab like {"This","is","Yougui","Liao"}, their vectors might like this "This" = [1.12,1.42.1,45.1,52].

============================================

=================================================================================