Title: | String Similarity Computation Using 'RapidFuzz' |
---|---|
Description: | Provides a high-performance interface for calculating string similarities and distances, leveraging the efficient library 'RapidFuzz' <https://github.com/rapidfuzz/rapidfuzz-cpp>. This package integrates the 'C++' implementation, allowing 'R' users to access cutting-edge algorithms for fuzzy matching and text analysis. |
Authors: | Andre Leite [aut, cre], Hugo Vaconcelos [aut], Max Bachmann [ctb], Adam Cohen [ctb] |
Maintainer: | Andre Leite <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0 |
Built: | 2024-12-17 01:23:10 UTC |
Source: | https://github.com/strategicprojects/rapidfuzz |
Calculate the Damerau-Levenshtein distance between two strings.
Computes the Damerau-Levenshtein distance, which is an edit distance allowing transpositions in addition to substitutions, insertions, and deletions.
damerau_levenshtein_distance(s1, s2, score_cutoff = NULL)
damerau_levenshtein_distance(s1, s2, score_cutoff = NULL)
s1 |
A string. The first input string. |
s2 |
A string. The second input string. |
score_cutoff |
An optional maximum threshold for the distance. Defaults to the largest integer value in R ('.Machine$integer.max'). |
The Damerau-Levenshtein distance as an integer.
damerau_levenshtein_distance("abcdef", "abcfde") damerau_levenshtein_distance("abcdef", "abcfde", score_cutoff = 3)
damerau_levenshtein_distance("abcdef", "abcfde") damerau_levenshtein_distance("abcdef", "abcfde", score_cutoff = 3)
Calculate the normalized Damerau-Levenshtein distance between two strings.
Computes the normalized Damerau-Levenshtein distance, where the result is between 0.0 (identical) and 1.0 (completely different).
damerau_levenshtein_normalized_distance(s1, s2, score_cutoff = 1)
damerau_levenshtein_normalized_distance(s1, s2, score_cutoff = 1)
s1 |
A string. The first input string. |
s2 |
A string. The second input string. |
score_cutoff |
An optional maximum threshold for the normalized distance. Defaults to 1.0. |
The normalized Damerau-Levenshtein distance as a double.
damerau_levenshtein_normalized_distance("abcdef", "abcfde") damerau_levenshtein_normalized_distance("abcdef", "abcfde", score_cutoff = 0.5)
damerau_levenshtein_normalized_distance("abcdef", "abcfde") damerau_levenshtein_normalized_distance("abcdef", "abcfde", score_cutoff = 0.5)
Calculate the normalized Damerau-Levenshtein similarity between two strings.
Computes the normalized similarity based on the Damerau-Levenshtein metric, where the result is between 0.0 (completely different) and 1.0 (identical).
damerau_levenshtein_normalized_similarity(s1, s2, score_cutoff = 0)
damerau_levenshtein_normalized_similarity(s1, s2, score_cutoff = 0)
s1 |
A string. The first input string. |
s2 |
A string. The second input string. |
score_cutoff |
An optional minimum threshold for the normalized similarity. Defaults to 0.0. |
The normalized Damerau-Levenshtein similarity as a double.
damerau_levenshtein_normalized_similarity("abcdef", "abcfde") damerau_levenshtein_normalized_similarity("abcdef", "abcfde", score_cutoff = 0.7)
damerau_levenshtein_normalized_similarity("abcdef", "abcfde") damerau_levenshtein_normalized_similarity("abcdef", "abcfde", score_cutoff = 0.7)
Calculate the Damerau-Levenshtein similarity between two strings.
Computes the similarity based on the Damerau-Levenshtein metric, which considers transpositions in addition to substitutions, insertions, and deletions.
damerau_levenshtein_similarity(s1, s2, score_cutoff = 0L)
damerau_levenshtein_similarity(s1, s2, score_cutoff = 0L)
s1 |
A string. The first input string. |
s2 |
A string. The second input string. |
score_cutoff |
An optional minimum threshold for the similarity score. Defaults to 0. |
The Damerau-Levenshtein similarity as an integer.
damerau_levenshtein_similarity("abcdef", "abcfde") damerau_levenshtein_similarity("abcdef", "abcfde", score_cutoff = 3)
damerau_levenshtein_similarity("abcdef", "abcfde") damerau_levenshtein_similarity("abcdef", "abcfde", score_cutoff = 3)
Applies edit operations to transform a string.
editops_apply_str(editops, s1, s2)
editops_apply_str(editops, s1, s2)
editops |
A data frame of edit operations (type, src_pos, dest_pos). |
s1 |
The source string. |
s2 |
The target string. |
The transformed string.
Applies edit operations to transform a string.
editops_apply_vec(editops, s1, s2)
editops_apply_vec(editops, s1, s2)
editops |
A data frame of edit operations (type, src_pos, dest_pos). |
s1 |
The source string. |
s2 |
The target string. |
A character vector representing the transformed string.
Compares a query string to all strings in a list of choices and returns the best match with a similarity score above the score_cutoff.
extract_best_match(query, choices, score_cutoff = 50, processor = TRUE)
extract_best_match(query, choices, score_cutoff = 50, processor = TRUE)
query |
The query string to compare. |
choices |
A vector of strings to compare against the query. |
score_cutoff |
A numeric value specifying the minimum similarity score (default is 50.0). |
processor |
A boolean indicating whether to preprocess strings before comparison (default is TRUE). |
A list containing the best matching string and its similarity score.
Compares a query string to a list of choices using the specified scorer and returns the top matches with a similarity score above the cutoff.
extract_matches( query, choices, score_cutoff = 50, limit = 3L, processor = TRUE, scorer = "WRatio" )
extract_matches( query, choices, score_cutoff = 50, limit = 3L, processor = TRUE, scorer = "WRatio" )
query |
The query string to compare. |
choices |
A vector of strings to compare against the query. |
score_cutoff |
A numeric value specifying the minimum similarity score (default is 50.0). |
limit |
The maximum number of matches to return (default is 3). |
processor |
A boolean indicating whether to preprocess strings before comparison (default is TRUE). |
scorer |
A string specifying the similarity scoring method ("WRatio", "Ratio", "PartialRatio", etc.). |
A data frame containing the top matched strings and their similarity scores.
Compares a query string to all strings in a list of choices and returns all elements with a similarity score above the score_cutoff.
extract_similar_strings(query, choices, score_cutoff = 50, processor = TRUE)
extract_similar_strings(query, choices, score_cutoff = 50, processor = TRUE)
query |
The query string to compare. |
choices |
A vector of strings to compare against the query. |
score_cutoff |
A numeric value specifying the minimum similarity score (default is 50.0). |
processor |
A boolean indicating whether to preprocess strings before comparison (default is TRUE). |
A data frame containing matched strings and their similarity scores.
Calculates a partial ratio between two strings, which ignores long mismatching substrings.
fuzz_partial_ratio(s1, s2, score_cutoff = 0)
fuzz_partial_ratio(s1, s2, score_cutoff = 0)
s1 |
First string. |
s2 |
Second string. |
score_cutoff |
Optional score cutoff threshold (default: 0.0). |
A double representing the partial ratio between the two strings.
fuzz_partial_ratio("this is a test", "this is a test!")
fuzz_partial_ratio("this is a test", "this is a test!")
Calculates a quick ratio using fuzz ratio.
fuzz_QRatio(s1, s2, score_cutoff = 0)
fuzz_QRatio(s1, s2, score_cutoff = 0)
s1 |
First string. |
s2 |
Second string. |
score_cutoff |
Optional score cutoff threshold (default: 0.0). |
A double representing the quick ratio between the two strings.
fuzz_QRatio("this is a test", "this is a test!")
fuzz_QRatio("this is a test", "this is a test!")
Calculates a simple ratio between two strings.
fuzz_ratio(s1, s2, score_cutoff = 0)
fuzz_ratio(s1, s2, score_cutoff = 0)
s1 |
First string. |
s2 |
Second string. |
score_cutoff |
Optional score cutoff threshold (default: 0.0). |
A double representing the ratio between the two strings.
fuzz_ratio("this is a test", "this is a test!")
fuzz_ratio("this is a test", "this is a test!")
Calculates the maximum ratio of token set ratio and token sort ratio.
fuzz_token_ratio(s1, s2, score_cutoff = 0)
fuzz_token_ratio(s1, s2, score_cutoff = 0)
s1 |
First string. |
s2 |
Second string. |
score_cutoff |
Optional score cutoff threshold (default: 0.0). |
A double representing the combined token ratio between the two strings.
fuzz_token_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
fuzz_token_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
Compares the unique and common words in the strings and calculates the ratio.
fuzz_token_set_ratio(s1, s2, score_cutoff = 0)
fuzz_token_set_ratio(s1, s2, score_cutoff = 0)
s1 |
First string. |
s2 |
Second string. |
score_cutoff |
Optional score cutoff threshold (default: 0.0). |
A double representing the token set ratio between the two strings.
fuzz_token_set_ratio("fuzzy wuzzy was a bear", "fuzzy fuzzy was a bear")
fuzz_token_set_ratio("fuzzy wuzzy was a bear", "fuzzy fuzzy was a bear")
Sorts the words in the strings and calculates the ratio between them.
fuzz_token_sort_ratio(s1, s2, score_cutoff = 0)
fuzz_token_sort_ratio(s1, s2, score_cutoff = 0)
s1 |
First string. |
s2 |
Second string. |
score_cutoff |
Optional score cutoff threshold (default: 0.0). |
A double representing the token sort ratio between the two strings.
fuzz_token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
fuzz_token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
Calculates a weighted ratio based on other ratio algorithms.
fuzz_WRatio(s1, s2, score_cutoff = 0)
fuzz_WRatio(s1, s2, score_cutoff = 0)
s1 |
First string. |
s2 |
Second string. |
score_cutoff |
Optional score cutoff threshold (default: 0.0). |
A double representing the weighted ratio between the two strings.
fuzz_WRatio("this is a test", "this is a test!")
fuzz_WRatio("this is a test", "this is a test!")
Generates edit operations between two strings.
get_editops(s1, s2)
get_editops(s1, s2)
s1 |
The source string. |
s2 |
The target string. |
A DataFrame with edit operations.
Calculates the Hamming distance between two strings.
hamming_distance(s1, s2, pad = TRUE)
hamming_distance(s1, s2, pad = TRUE)
s1 |
The first string. |
s2 |
The second string. |
pad |
If true, the strings are padded to the same length (default: TRUE). |
An integer representing the Hamming distance.
hamming_distance("karolin", "kathrin")
hamming_distance("karolin", "kathrin")
Calculates the normalized Hamming distance between two strings.
hamming_normalized_distance(s1, s2, pad = TRUE)
hamming_normalized_distance(s1, s2, pad = TRUE)
s1 |
The first string. |
s2 |
The second string. |
pad |
If true, the strings are padded to the same length (default: TRUE). |
A value between 0 and 1 representing the normalized distance.
hamming_normalized_distance("karolin", "kathrin")
hamming_normalized_distance("karolin", "kathrin")
Calculates the normalized Hamming similarity between two strings.
hamming_normalized_similarity(s1, s2, pad = TRUE)
hamming_normalized_similarity(s1, s2, pad = TRUE)
s1 |
The first string. |
s2 |
The second string. |
pad |
If true, the strings are padded to the same length (default: TRUE). |
A value between 0 and 1 representing the normalized similarity.
hamming_normalized_similarity("karolin", "kathrin")
hamming_normalized_similarity("karolin", "kathrin")
Measures the similarity between two strings using the Hamming metric.
hamming_similarity(s1, s2, pad = TRUE)
hamming_similarity(s1, s2, pad = TRUE)
s1 |
The first string. |
s2 |
The second string. |
pad |
If true, the strings are padded to the same length (default: TRUE). |
An integer representing the similarity.
hamming_similarity("karolin", "kathrin")
hamming_similarity("karolin", "kathrin")
Calculates the insertion/deletion (Indel) distance between two strings.
indel_distance(s1, s2)
indel_distance(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the Indel distance.
indel_distance("kitten", "sitting")
indel_distance("kitten", "sitting")
Calculates the normalized insertion/deletion (Indel) distance between two strings.
indel_normalized_distance(s1, s2)
indel_normalized_distance(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value between 0 and 1 representing the normalized Indel distance.
indel_normalized_distance("kitten", "sitting")
indel_normalized_distance("kitten", "sitting")
Calculates the normalized insertion/deletion (Indel) similarity between two strings.
indel_normalized_similarity(s1, s2)
indel_normalized_similarity(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value between 0 and 1 representing the normalized Indel similarity.
indel_normalized_similarity("kitten", "sitting")
indel_normalized_similarity("kitten", "sitting")
Calculates the insertion/deletion (Indel) similarity between two strings.
indel_similarity(s1, s2)
indel_similarity(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the Indel similarity.
indel_similarity("kitten", "sitting")
indel_similarity("kitten", "sitting")
Calculates the Jaro distance between two strings, a value between 0 and 1.
jaro_distance(s1, s2)
jaro_distance(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the Jaro distance.
jaro_distance("kitten", "sitting")
jaro_distance("kitten", "sitting")
Calculates the normalized Jaro distance between two strings, a value between 0 and 1.
jaro_normalized_distance(s1, s2)
jaro_normalized_distance(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the normalized Jaro distance.
jaro_normalized_distance("kitten", "sitting")
jaro_normalized_distance("kitten", "sitting")
Calculates the normalized Jaro similarity between two strings, a value between 0 and 1.
jaro_normalized_similarity(s1, s2)
jaro_normalized_similarity(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the normalized Jaro similarity.
jaro_normalized_similarity("kitten", "sitting")
jaro_normalized_similarity("kitten", "sitting")
Calculates the Jaro similarity between two strings, a value between 0 and 1.
jaro_similarity(s1, s2)
jaro_similarity(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the Jaro similarity.
jaro_similarity("kitten", "sitting")
jaro_similarity("kitten", "sitting")
Calculates the Jaro-Winkler distance between two strings.
jaro_winkler_distance(s1, s2, prefix_weight = 0.1)
jaro_winkler_distance(s1, s2, prefix_weight = 0.1)
s1 |
The first string. |
s2 |
The second string. |
prefix_weight |
The weight applied to the prefix (default: 0.1). |
A numeric value representing the Jaro-Winkler distance.
jaro_winkler_distance("kitten", "sitting")
jaro_winkler_distance("kitten", "sitting")
Calculates the normalized Jaro-Winkler distance between two strings.
jaro_winkler_normalized_distance(s1, s2, prefix_weight = 0.1)
jaro_winkler_normalized_distance(s1, s2, prefix_weight = 0.1)
s1 |
The first string. |
s2 |
The second string. |
prefix_weight |
The weight applied to the prefix (default: 0.1). |
A numeric value representing the normalized Jaro-Winkler distance.
jaro_winkler_normalized_distance("kitten", "sitting")
jaro_winkler_normalized_distance("kitten", "sitting")
Calcula a similaridade normalizada Jaro-Winkler entre duas strings.
jaro_winkler_normalized_similarity(s1, s2, prefix_weight = 0.1)
jaro_winkler_normalized_similarity(s1, s2, prefix_weight = 0.1)
s1 |
Primeira string. |
s2 |
Segunda string. |
prefix_weight |
Peso do prefixo (valor padrão: 0.1). |
Um valor numérico representando a similaridade normalizada Jaro-Winkler.
jaro_winkler_normalized_similarity("kitten", "sitting")
jaro_winkler_normalized_similarity("kitten", "sitting")
Calculates the Jaro-Winkler similarity between two strings.
jaro_winkler_similarity(s1, s2, prefix_weight = 0.1)
jaro_winkler_similarity(s1, s2, prefix_weight = 0.1)
s1 |
The first string. |
s2 |
The second string. |
prefix_weight |
The weight applied to the prefix (default: 0.1). |
A numeric value representing the Jaro-Winkler similarity.
jaro_winkler_similarity("kitten", "sitting")
jaro_winkler_similarity("kitten", "sitting")
Calculates the LCSseq (Longest Common Subsequence) distance between two strings.
lcs_seq_distance(s1, s2, score_cutoff = NULL)
lcs_seq_distance(s1, s2, score_cutoff = NULL)
s1 |
The first string. |
s2 |
The second string. |
score_cutoff |
Score threshold to stop calculation. Default is the maximum possible value. |
A numeric value representing the LCSseq distance.
lcs_seq_distance("kitten", "sitting")
lcs_seq_distance("kitten", "sitting")
Calculates the edit operations required to transform one string into another.
lcs_seq_editops(s1, s2)
lcs_seq_editops(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A data.frame containing the edit operations (substitutions, insertions, and deletions).
lcs_seq_editops("kitten", "sitting")
lcs_seq_editops("kitten", "sitting")
Calculates the normalized LCSseq distance between two strings.
lcs_seq_normalized_distance(s1, s2, score_cutoff = 1)
lcs_seq_normalized_distance(s1, s2, score_cutoff = 1)
s1 |
The first string. |
s2 |
The second string. |
score_cutoff |
Score threshold to stop calculation. Default is 1.0. |
A numeric value representing the normalized LCSseq distance.
lcs_seq_normalized_distance("kitten", "sitting")
lcs_seq_normalized_distance("kitten", "sitting")
Calculates the normalized LCSseq similarity between two strings.
lcs_seq_normalized_similarity(s1, s2, score_cutoff = 0)
lcs_seq_normalized_similarity(s1, s2, score_cutoff = 0)
s1 |
The first string. |
s2 |
The second string. |
score_cutoff |
Score threshold to stop calculation. Default is 0.0. |
A numeric value representing the normalized LCSseq similarity.
lcs_seq_normalized_similarity("kitten", "sitting")
lcs_seq_normalized_similarity("kitten", "sitting")
Calculates the LCSseq similarity between two strings.
lcs_seq_similarity(s1, s2, score_cutoff = 0L)
lcs_seq_similarity(s1, s2, score_cutoff = 0L)
s1 |
The first string. |
s2 |
The second string. |
score_cutoff |
Score threshold to stop calculation. Default is 0. |
A numeric value representing the LCSseq similarity.
lcs_seq_similarity("kitten", "sitting")
lcs_seq_similarity("kitten", "sitting")
Calculates the Levenshtein distance between two strings, which represents the minimum number of insertions, deletions, and substitutions required to transform one string into the other.
levenshtein_distance(s1, s2)
levenshtein_distance(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the Levenshtein distance.
levenshtein_distance("kitten", "sitting")
levenshtein_distance("kitten", "sitting")
The normalized Levenshtein distance is the Levenshtein distance divided by the maximum length of the compared strings, returning a value between 0 and 1.
levenshtein_normalized_distance(s1, s2)
levenshtein_normalized_distance(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the normalized Levenshtein distance.
levenshtein_normalized_distance("kitten", "sitting")
levenshtein_normalized_distance("kitten", "sitting")
The normalized Levenshtein similarity returns a value between 0 and 1, indicating how similar the compared strings are.
levenshtein_normalized_similarity(s1, s2)
levenshtein_normalized_similarity(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the normalized Levenshtein similarity.
levenshtein_normalized_similarity("kitten", "sitting")
levenshtein_normalized_similarity("kitten", "sitting")
Levenshtein similarity measures how similar two strings are, based on the minimum number of operations required to make them identical.
levenshtein_similarity(s1, s2)
levenshtein_similarity(s1, s2)
s1 |
The first string. |
s2 |
The second string. |
A numeric value representing the Levenshtein similarity.
levenshtein_similarity("kitten", "sitting")
levenshtein_similarity("kitten", "sitting")
Applies opcodes to transform a string.
opcodes_apply_str(opcodes, s1, s2)
opcodes_apply_str(opcodes, s1, s2)
opcodes |
A data frame of opcode transformations (type, src_begin, src_end, dest_begin, dest_end). |
s1 |
The source string. |
s2 |
The target string. |
The transformed string.
Applies opcodes to transform a string.
opcodes_apply_vec(opcodes, s1, s2)
opcodes_apply_vec(opcodes, s1, s2)
opcodes |
A data frame of opcode transformations (type, src_begin, src_end, dest_begin, dest_end). |
s1 |
The source string. |
s2 |
The target string. |
A character vector representing the transformed string.
Calculates the OSA distance between two strings.
osa_distance(s1, s2, score_cutoff = NULL)
osa_distance(s1, s2, score_cutoff = NULL)
s1 |
A string to compare. |
s2 |
Another string to compare. |
score_cutoff |
A threshold for the distance score (default is the maximum possible size_t value). |
An integer representing the OSA distance.
osa_distance("string1", "string2")
osa_distance("string1", "string2")
Provides the edit operations required to transform one string into another using the OSA algorithm.
osa_editops(s1, s2)
osa_editops(s1, s2)
s1 |
A string to transform. |
s2 |
A target string. |
A data frame with the following columns:
The type of operation (delete, insert, replace).
The position in the source string.
The position in the target string.
osa_editops("string1", "string2")
osa_editops("string1", "string2")
Calculates the normalized OSA distance between two strings.
osa_normalized_distance(s1, s2, score_cutoff = 1)
osa_normalized_distance(s1, s2, score_cutoff = 1)
s1 |
A string to compare. |
s2 |
Another string to compare. |
score_cutoff |
A threshold for the normalized distance score (default is 1.0). |
A double representing the normalized distance score.
osa_normalized_distance("string1", "string2")
osa_normalized_distance("string1", "string2")
Calculates the normalized similarity between two strings using the Optimal String Alignment (OSA) algorithm.
osa_normalized_similarity(s1, s2, score_cutoff = 0)
osa_normalized_similarity(s1, s2, score_cutoff = 0)
s1 |
A string to compare. |
s2 |
Another string to compare. |
score_cutoff |
A threshold for the normalized similarity score (default is 0.0). |
A double representing the normalized similarity score.
osa_normalized_similarity("string1", "string2")
osa_normalized_similarity("string1", "string2")
Calculates the OSA similarity between two strings.
osa_similarity(s1, s2, score_cutoff = 0L)
osa_similarity(s1, s2, score_cutoff = 0L)
s1 |
A string to compare. |
s2 |
Another string to compare. |
score_cutoff |
A threshold for the similarity score (default is 0). |
An integer representing the OSA similarity.
osa_similarity("string1", "string2")
osa_similarity("string1", "string2")
Calculates the distance between the postfixes of two strings.
postfix_distance(s1, s2, score_cutoff = NULL)
postfix_distance(s1, s2, score_cutoff = NULL)
s1 |
A string to compare. |
s2 |
Another string to compare. |
score_cutoff |
A threshold for the distance score (default is the maximum possible size_t value). |
An integer representing the postfix distance.
postfix_distance("string1", "string2")
postfix_distance("string1", "string2")
Calculates the normalized distance between the postfixes of two strings.
postfix_normalized_distance(s1, s2, score_cutoff = 1)
postfix_normalized_distance(s1, s2, score_cutoff = 1)
s1 |
A string to compare. |
s2 |
Another string to compare. |
score_cutoff |
A threshold for the normalized distance score (default is 1.0). |
A double representing the normalized postfix distance.
postfix_normalized_distance("string1", "string2")
postfix_normalized_distance("string1", "string2")
Calculates the normalized similarity between the postfixes of two strings.
postfix_normalized_similarity(s1, s2, score_cutoff = 0)
postfix_normalized_similarity(s1, s2, score_cutoff = 0)
s1 |
A string to compare. |
s2 |
Another string to compare. |
score_cutoff |
A threshold for the normalized similarity score (default is 0.0). |
A double representing the normalized postfix similarity.
postfix_normalized_similarity("string1", "string2")
postfix_normalized_similarity("string1", "string2")
Calculates the similarity between the postfixes of two strings.
postfix_similarity(s1, s2, score_cutoff = 0L)
postfix_similarity(s1, s2, score_cutoff = 0L)
s1 |
A string to compare. |
s2 |
Another string to compare. |
score_cutoff |
A threshold for the similarity score (default is 0). |
An integer representing the postfix similarity.
postfix_similarity("string1", "string2")
postfix_similarity("string1", "string2")
Computes the prefix distance, which measures the number of character edits required to convert one prefix into another. This includes insertions, deletions, and substitutions.
prefix_distance(s1, s2, score_cutoff = NULL)
prefix_distance(s1, s2, score_cutoff = NULL)
s1 |
A string. The first input string. |
s2 |
A string. The second input string. |
score_cutoff |
An optional maximum threshold for the distance. Defaults to the largest integer value in R ('.Machine$integer.max'). |
The prefix distance as an integer.
prefix_distance("abcdef", "abcxyz") prefix_distance("abcdef", "abcxyz", score_cutoff = 3)
prefix_distance("abcdef", "abcxyz") prefix_distance("abcdef", "abcxyz", score_cutoff = 3)
Computes the normalized distance of the prefixes of two strings, where the result is between 0.0 (identical) and 1.0 (completely different).
prefix_normalized_distance(s1, s2, score_cutoff = 1)
prefix_normalized_distance(s1, s2, score_cutoff = 1)
s1 |
A string. The first input string. |
s2 |
A string. The second input string. |
score_cutoff |
An optional maximum threshold for the normalized distance. Defaults to 1.0. |
The normalized prefix distance as a double.
prefix_normalized_distance("abcdef", "abcxyz") prefix_normalized_distance("abcdef", "abcxyz", score_cutoff = 0.5)
prefix_normalized_distance("abcdef", "abcxyz") prefix_normalized_distance("abcdef", "abcxyz", score_cutoff = 0.5)
Computes the normalized similarity of the prefixes of two strings, where the result is between 0.0 (completely different) and 1.0 (identical).
prefix_normalized_similarity(s1, s2, score_cutoff = 0)
prefix_normalized_similarity(s1, s2, score_cutoff = 0)
s1 |
A string. The first input string. |
s2 |
A string. The second input string. |
score_cutoff |
An optional minimum threshold for the normalized similarity. Defaults to 0.0. |
The normalized prefix similarity as a double.
prefix_normalized_similarity("abcdef", "abcxyz") prefix_normalized_similarity("abcdef", "abcxyz", score_cutoff = 0.7)
prefix_normalized_similarity("abcdef", "abcxyz") prefix_normalized_similarity("abcdef", "abcxyz", score_cutoff = 0.7)
Computes the similarity of the prefixes of two strings based on their number of matching characters.
prefix_similarity(s1, s2, score_cutoff = 0L)
prefix_similarity(s1, s2, score_cutoff = 0L)
s1 |
A string. The first input string. |
s2 |
A string. The second input string. |
score_cutoff |
An optional minimum threshold for the similarity score. Defaults to 0. |
The prefix similarity as an integer.
prefix_similarity("abcdef", "abcxyz") prefix_similarity("abcdef", "abcxyz", score_cutoff = 3)
prefix_similarity("abcdef", "abcxyz") prefix_similarity("abcdef", "abcxyz", score_cutoff = 3)
Processes a given input string by applying optional trimming, case conversion, and ASCII transliteration.
processString(input, processor = TRUE, asciify = FALSE)
processString(input, processor = TRUE, asciify = FALSE)
input |
A |
processor |
A |
asciify |
A |
The function applies the following transformations to the input string, in this order:
Trimming (if processor = TRUE
): Removes leading and trailing whitespace.
Lowercasing (if processor = TRUE
): Converts all characters to lowercase.
ASCII Transliteration (if asciify = TRUE
): Replaces accented or special characters with their closest ASCII equivalents.
A std::string
representing the processed string.
# Example usage processString(" Éxâmple! ", processor = TRUE, asciify = TRUE) # Returns: "example!" processString(" Éxâmple! ", processor = TRUE, asciify = FALSE) # Returns: "éxâmple!" processString(" Éxâmple! ", processor = FALSE, asciify = TRUE) # Returns: "Éxâmple!" processString(" Éxâmple! ", processor = FALSE, asciify = FALSE) # Returns: " Éxâmple! "
# Example usage processString(" Éxâmple! ", processor = TRUE, asciify = TRUE) # Returns: "example!" processString(" Éxâmple! ", processor = TRUE, asciify = FALSE) # Returns: "éxâmple!" processString(" Éxâmple! ", processor = FALSE, asciify = TRUE) # Returns: "Éxâmple!" processString(" Éxâmple! ", processor = FALSE, asciify = FALSE) # Returns: " Éxâmple! "