본문 바로가기

NLP

NLP 관련 메모

1. 자카드 지수  = Jaccard index = 유사도 계산 알고리즘

 

위키피디아 : en.wikipedia.org/wiki/Jaccard_index

 

Jaccard index - Wikipedia

From Wikipedia, the free encyclopedia Jump to navigation Jump to search measure of similarity and diversity between sets The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sa

en.wikipedia.org

 

2. 레벤시테인 거리 = Levenshtein distance = 편집 알고리즘 

 

위키피디아 : en.wikipedia.org/wiki/Levenshtein_distance

 

Levenshtein distance - Wikipedia

From Wikipedia, the free encyclopedia Jump to navigation Jump to search Computer science metric for string similarity In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between

en.wikipedia.org

 

3. 유클리드 벡터 거리 = Euclidean vector distance

 

위키피디아 : en.wikipedia.org/wiki/Euclidean_vector

 

Euclidean vector - Wikipedia

From Wikipedia, the free encyclopedia Jump to navigation Jump to search Geometric object that has length and direction A vector pointing from A to B In mathematics, physics and engineering, a Euclidean vector or simply a vector (sometimes called a geometri

en.wikipedia.org

 

=> 위 세가지 와 같은측도들은 오타나 맞춤법를 어느정도 케어 가능

=> 아예 다른 두 단어를 가깝게 판정하는 오류 발생 가능성 있음. (ex : 'bar' , bar')

 

반응형