Electron microscopy
 
Rake_NLTK
- Python for Integrated Circuits -
- An Online Book -
Python for Integrated Circuits                                                                                   http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Rapid Automatic Keyword Extraction (RAKE) is a well-known, domain-independent keyword extraction algorithm, in Natural Language Processing (page4315), that finds the most relevant words or phrases in a piece of text using a set of stopwords and phrase delimiters. RAKE is an Individual document-oriented dynamic Information retrieval method. Rake NLTK is an expanded version of RAKE that is supported by NLTK. The steps for RAKE are:
          i) Initialize a word input list.
          ii) Split the input text content by dotes.
          iii) Create a matrix of word co-occurrences.
          iv) Word scoring, which can be calculated as the degree of a word in the matrix, as the word frequency, or as the degree of the word divided by its frequency.
          v) Keyphrases can also create by combining the keywords. Phrases are obained with the separation of punctuation and/or stop words.
          vi) A keyword or keyphrase is chosen if and only if its score belongs to the top T scores where T is the number of keywords you want to extract.
          vii) Python implementation of keyword extraction using Rake algorithm.

One of the critical points made by the creator of RAKE is that keywords frequently contain multiple words but rarely contain punctuation (e.g. period, comma, apostrophe, quotation, question, exclamation, brackets, braces, parenthesis, dash, hyphen, ellipsis, colon, semicolon), stop words (e.g. the, is, and, not, that, there, are, many, that, can, you, with, one, of, those), or other words with minimum lexical meaning. Therefore, the "content words" is obtained by the equation below:
                  Content_Word= Corpus – Stopwords – Delimiter --------------------------------------- [4307]

The concept of RAKE is built on three matrices:
          i) Word Degree (deg(w)),
          ii) Word Frequency (freq(w)),
          iii) Ratio of the degree to frequency (deg(w)/freq(w)).

============================================

Rapid Automatic Keyword Extraction (RAKE). In the program, Word Frequency is the occurrence of the same word, Degree of Word is the sum of the accordance of a word together with other words in the same phrases, and Degree Score is equal to (Degree of Word)/(Degree Score). Code:
         Automatically Review, Scroll, Click Webpage and Its Link
Input text in the code is:          
         Automatically Review, Scroll, Click Webpage and Its Link
Output (error: two same ones are counted separately):          
         Automatically Review, Scroll, Click Webpage and Its Link
Marked phrases (the phrase accuracy is not good):          
         Automatically Review, Scroll, Click Webpage and Its Link

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 

 

 

 

 

 

=================================================================================