A Little History of Word Representation
As a semi-computer scientist, I always dream that we, humans, can make the computers understand what we say. Instead of writing sophisticated code or using other input devices (a mouse and a keyboard) to give instructions to computers, we can directly talk to a computer and give instructions. The first step to accomplish this science fiction scene is to let the computer understand what we say – natural languages, which is the ultimate target of natural language processing(NLP) research. The first step to making computers understand natural languages is to represent the fundamental element of natural language, words, as a form that computers can understand. This raises a big research direction in NLP: word representations. This is the sub-area I am interested most in NLP. In this series post, I am going to present some vital research work on this topic. ...
Sampling Trick
During the recent paper readings, I find that we need to estimate the size of a subset from time to time. This subset comes from a extreme large set which means this subset may also be extreme large. The naive way to know the size of this subset is to go through all the elements of this subset. However, because of the extreme large size, it is impossible for us in practise. So, what we need is a way to estimate the size of subset quickly. One of the typical scenarios is the Ranking Problem. In Ranking Problem problem, we need to know how many instances are there ranking before the current instance. And this operation will be applied to each instance. Apparently, it is impossible for us to go through the whole training set. ...