Package 'RapidFuzz' reference manual

Title:	String Similarity Computation Using 'RapidFuzz'
Description:	Provides a high-performance interface for calculating string similarities and distances, leveraging the efficient library 'RapidFuzz' <https://github.com/rapidfuzz/rapidfuzz-cpp>. This package integrates the 'C++' implementation, allowing 'R' users to access cutting-edge algorithms for fuzzy matching and text analysis.
Authors:	Andre Leite [aut, cre], Hugo Vaconcelos [aut], Max Bachmann [ctb], Adam Cohen [ctb]
Maintainer:	Andre Leite <[email protected]>
License:	MIT + file LICENSE
Version:	1.0
Built:	2024-12-17 01:23:10 UTC
Source:	https://github.com/strategicprojects/rapidfuzz

Damerau-Levenshtein Distance

Description

Calculate the Damerau-Levenshtein distance between two strings.

Computes the Damerau-Levenshtein distance, which is an edit distance allowing transpositions in addition to substitutions, insertions, and deletions.

Usage

damerau_levenshtein_distance(s1, s2, score_cutoff = NULL)
damerau_levenshtein_distance(s1, s2, score_cutoff = NULL)

Arguments

`s1`	A string. The first input string.
`s2`	A string. The second input string.
`score_cutoff`	An optional maximum threshold for the distance. Defaults to the largest integer value in R ('.Machine$integer.max').

Value

The Damerau-Levenshtein distance as an integer.

Examples

damerau_levenshtein_distance("abcdef", "abcfde")
damerau_levenshtein_distance("abcdef", "abcfde", score_cutoff = 3)
damerau_levenshtein_distance("abcdef", "abcfde")
damerau_levenshtein_distance("abcdef", "abcfde", score_cutoff = 3)

Normalized Damerau-Levenshtein Distance

Description

Calculate the normalized Damerau-Levenshtein distance between two strings.

Computes the normalized Damerau-Levenshtein distance, where the result is between 0.0 (identical) and 1.0 (completely different).

Usage

damerau_levenshtein_normalized_distance(s1, s2, score_cutoff = 1)
damerau_levenshtein_normalized_distance(s1, s2, score_cutoff = 1)

Arguments

`s1`	A string. The first input string.
`s2`	A string. The second input string.
`score_cutoff`	An optional maximum threshold for the normalized distance. Defaults to 1.0.

Value

The normalized Damerau-Levenshtein distance as a double.

Examples

damerau_levenshtein_normalized_distance("abcdef", "abcfde")
damerau_levenshtein_normalized_distance("abcdef", "abcfde", score_cutoff = 0.5)
damerau_levenshtein_normalized_distance("abcdef", "abcfde")
damerau_levenshtein_normalized_distance("abcdef", "abcfde", score_cutoff = 0.5)

Normalized Damerau-Levenshtein Similarity

Description

Calculate the normalized Damerau-Levenshtein similarity between two strings.

Computes the normalized similarity based on the Damerau-Levenshtein metric, where the result is between 0.0 (completely different) and 1.0 (identical).

Usage

damerau_levenshtein_normalized_similarity(s1, s2, score_cutoff = 0)
damerau_levenshtein_normalized_similarity(s1, s2, score_cutoff = 0)

Arguments

`s1`	A string. The first input string.
`s2`	A string. The second input string.
`score_cutoff`	An optional minimum threshold for the normalized similarity. Defaults to 0.0.

Value

The normalized Damerau-Levenshtein similarity as a double.

Examples

damerau_levenshtein_normalized_similarity("abcdef", "abcfde")
damerau_levenshtein_normalized_similarity("abcdef", "abcfde", score_cutoff = 0.7)
damerau_levenshtein_normalized_similarity("abcdef", "abcfde")
damerau_levenshtein_normalized_similarity("abcdef", "abcfde", score_cutoff = 0.7)

Damerau-Levenshtein Similarity

Description

Calculate the Damerau-Levenshtein similarity between two strings.

Computes the similarity based on the Damerau-Levenshtein metric, which considers transpositions in addition to substitutions, insertions, and deletions.

Usage

damerau_levenshtein_similarity(s1, s2, score_cutoff = 0L)
damerau_levenshtein_similarity(s1, s2, score_cutoff = 0L)

Arguments

`s1`	A string. The first input string.
`s2`	A string. The second input string.
`score_cutoff`	An optional minimum threshold for the similarity score. Defaults to 0.

Value

The Damerau-Levenshtein similarity as an integer.

Examples

damerau_levenshtein_similarity("abcdef", "abcfde")
damerau_levenshtein_similarity("abcdef", "abcfde", score_cutoff = 3)
damerau_levenshtein_similarity("abcdef", "abcfde")
damerau_levenshtein_similarity("abcdef", "abcfde", score_cutoff = 3)

Apply Edit Operations to String

Description

Applies edit operations to transform a string.

Usage

editops_apply_str(editops, s1, s2)
editops_apply_str(editops, s1, s2)

Arguments

`editops`	A data frame of edit operations (type, src_pos, dest_pos).
`s1`	The source string.
`s2`	The target string.

Value

The transformed string.

Apply Edit Operations to Vector

Description

Applies edit operations to transform a string.

Usage

editops_apply_vec(editops, s1, s2)
editops_apply_vec(editops, s1, s2)

Arguments

`editops`	A data frame of edit operations (type, src_pos, dest_pos).
`s1`	The source string.
`s2`	The target string.

Value

A character vector representing the transformed string.

Extract Best Match

Description

Compares a query string to all strings in a list of choices and returns the best match with a similarity score above the score_cutoff.

Usage

extract_best_match(query, choices, score_cutoff = 50, processor = TRUE)
extract_best_match(query, choices, score_cutoff = 50, processor = TRUE)

Arguments

`query`	The query string to compare.
`choices`	A vector of strings to compare against the query.
`score_cutoff`	A numeric value specifying the minimum similarity score (default is 50.0).
`processor`	A boolean indicating whether to preprocess strings before comparison (default is TRUE).

Value

A list containing the best matching string and its similarity score.

Extract Matches with Scoring and Limit

Description

Compares a query string to a list of choices using the specified scorer and returns the top matches with a similarity score above the cutoff.

Usage

extract_matches(
  query,
  choices,
  score_cutoff = 50,
  limit = 3L,
  processor = TRUE,
  scorer = "WRatio"
)
extract_matches(
  query,
  choices,
  score_cutoff = 50,
  limit = 3L,
  processor = TRUE,
  scorer = "WRatio"
)

Arguments

`query`	The query string to compare.
`choices`	A vector of strings to compare against the query.
`score_cutoff`	A numeric value specifying the minimum similarity score (default is 50.0).
`limit`	The maximum number of matches to return (default is 3).
`processor`	A boolean indicating whether to preprocess strings before comparison (default is TRUE).
`scorer`	A string specifying the similarity scoring method ("WRatio", "Ratio", "PartialRatio", etc.).

Value

A data frame containing the top matched strings and their similarity scores.

Extract Matches

Description

Compares a query string to all strings in a list of choices and returns all elements with a similarity score above the score_cutoff.

Usage

extract_similar_strings(query, choices, score_cutoff = 50, processor = TRUE)
extract_similar_strings(query, choices, score_cutoff = 50, processor = TRUE)

Arguments

`query`	The query string to compare.
`choices`	A vector of strings to compare against the query.
`score_cutoff`	A numeric value specifying the minimum similarity score (default is 50.0).
`processor`	A boolean indicating whether to preprocess strings before comparison (default is TRUE).

Value

A data frame containing matched strings and their similarity scores.

Partial Ratio Calculation

Description

Calculates a partial ratio between two strings, which ignores long mismatching substrings.

Usage

fuzz_partial_ratio(s1, s2, score_cutoff = 0)
fuzz_partial_ratio(s1, s2, score_cutoff = 0)

Arguments

`s1`	First string.
`s2`	Second string.
`score_cutoff`	Optional score cutoff threshold (default: 0.0).

Value

A double representing the partial ratio between the two strings.

Examples

fuzz_partial_ratio("this is a test", "this is a test!")
fuzz_partial_ratio("this is a test", "this is a test!")

Quick Ratio Calculation

Description

Calculates a quick ratio using fuzz ratio.

Usage

fuzz_QRatio(s1, s2, score_cutoff = 0)
fuzz_QRatio(s1, s2, score_cutoff = 0)

Arguments

`s1`	First string.
`s2`	Second string.
`score_cutoff`	Optional score cutoff threshold (default: 0.0).

Value

A double representing the quick ratio between the two strings.

Examples

fuzz_QRatio("this is a test", "this is a test!")
fuzz_QRatio("this is a test", "this is a test!")

Simple Ratio Calculation

Description

Calculates a simple ratio between two strings.

Usage

fuzz_ratio(s1, s2, score_cutoff = 0)
fuzz_ratio(s1, s2, score_cutoff = 0)

Arguments

`s1`	First string.
`s2`	Second string.
`score_cutoff`	Optional score cutoff threshold (default: 0.0).

Value

A double representing the ratio between the two strings.

Examples

fuzz_ratio("this is a test", "this is a test!")
fuzz_ratio("this is a test", "this is a test!")

Combined Token Ratio

Description

Calculates the maximum ratio of token set ratio and token sort ratio.

Usage

fuzz_token_ratio(s1, s2, score_cutoff = 0)
fuzz_token_ratio(s1, s2, score_cutoff = 0)

Arguments

`s1`	First string.
`s2`	Second string.
`score_cutoff`	Optional score cutoff threshold (default: 0.0).

Value

A double representing the combined token ratio between the two strings.

Examples

fuzz_token_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
fuzz_token_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

Token Set Ratio Calculation

Description

Compares the unique and common words in the strings and calculates the ratio.

Usage

fuzz_token_set_ratio(s1, s2, score_cutoff = 0)
fuzz_token_set_ratio(s1, s2, score_cutoff = 0)

Arguments

`s1`	First string.
`s2`	Second string.
`score_cutoff`	Optional score cutoff threshold (default: 0.0).

Value

A double representing the token set ratio between the two strings.

Examples

fuzz_token_set_ratio("fuzzy wuzzy was a bear", "fuzzy fuzzy was a bear")
fuzz_token_set_ratio("fuzzy wuzzy was a bear", "fuzzy fuzzy was a bear")

Token Sort Ratio Calculation

Description

Sorts the words in the strings and calculates the ratio between them.

Usage

fuzz_token_sort_ratio(s1, s2, score_cutoff = 0)
fuzz_token_sort_ratio(s1, s2, score_cutoff = 0)

Arguments

`s1`	First string.
`s2`	Second string.
`score_cutoff`	Optional score cutoff threshold (default: 0.0).

Value

A double representing the token sort ratio between the two strings.

Examples

fuzz_token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
fuzz_token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

Weighted Ratio Calculation

Description

Calculates a weighted ratio based on other ratio algorithms.

Usage

fuzz_WRatio(s1, s2, score_cutoff = 0)
fuzz_WRatio(s1, s2, score_cutoff = 0)

Arguments

`s1`	First string.
`s2`	Second string.
`score_cutoff`	Optional score cutoff threshold (default: 0.0).

Value

A double representing the weighted ratio between the two strings.

Examples

fuzz_WRatio("this is a test", "this is a test!")
fuzz_WRatio("this is a test", "this is a test!")

Get Edit Operations

Description

Generates edit operations between two strings.

Usage

get_editops(s1, s2)
get_editops(s1, s2)

Arguments

`s1`	The source string.
`s2`	The target string.

Value

A DataFrame with edit operations.

Hamming Distance

Description

Calculates the Hamming distance between two strings.

Usage

hamming_distance(s1, s2, pad = TRUE)
hamming_distance(s1, s2, pad = TRUE)

Arguments

`s1`	The first string.
`s2`	The second string.
`pad`	If true, the strings are padded to the same length (default: TRUE).

Value

An integer representing the Hamming distance.

Examples

hamming_distance("karolin", "kathrin")
hamming_distance("karolin", "kathrin")

Normalized Hamming Distance

Description

Calculates the normalized Hamming distance between two strings.

Usage

hamming_normalized_distance(s1, s2, pad = TRUE)
hamming_normalized_distance(s1, s2, pad = TRUE)

Arguments

`s1`	The first string.
`s2`	The second string.
`pad`	If true, the strings are padded to the same length (default: TRUE).

Value

A value between 0 and 1 representing the normalized distance.

Examples

hamming_normalized_distance("karolin", "kathrin")
hamming_normalized_distance("karolin", "kathrin")

Normalized Hamming Similarity

Description

Calculates the normalized Hamming similarity between two strings.

Usage

hamming_normalized_similarity(s1, s2, pad = TRUE)
hamming_normalized_similarity(s1, s2, pad = TRUE)

Arguments

`s1`	The first string.
`s2`	The second string.
`pad`	If true, the strings are padded to the same length (default: TRUE).

Value

A value between 0 and 1 representing the normalized similarity.

Examples

hamming_normalized_similarity("karolin", "kathrin")
hamming_normalized_similarity("karolin", "kathrin")

Hamming Similarity

Description

Measures the similarity between two strings using the Hamming metric.

Usage

hamming_similarity(s1, s2, pad = TRUE)
hamming_similarity(s1, s2, pad = TRUE)

Arguments

`s1`	The first string.
`s2`	The second string.
`pad`	If true, the strings are padded to the same length (default: TRUE).

Value

An integer representing the similarity.

Examples

hamming_similarity("karolin", "kathrin")
hamming_similarity("karolin", "kathrin")

Indel Distance

Description

Calculates the insertion/deletion (Indel) distance between two strings.

Usage

indel_distance(s1, s2)
indel_distance(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the Indel distance.

Examples

indel_distance("kitten", "sitting")
indel_distance("kitten", "sitting")

Normalized Indel Distance

Description

Calculates the normalized insertion/deletion (Indel) distance between two strings.

Usage

indel_normalized_distance(s1, s2)
indel_normalized_distance(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value between 0 and 1 representing the normalized Indel distance.

Examples

indel_normalized_distance("kitten", "sitting")
indel_normalized_distance("kitten", "sitting")

Normalized Indel Similarity

Description

Calculates the normalized insertion/deletion (Indel) similarity between two strings.

Usage

indel_normalized_similarity(s1, s2)
indel_normalized_similarity(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value between 0 and 1 representing the normalized Indel similarity.

Examples

indel_normalized_similarity("kitten", "sitting")
indel_normalized_similarity("kitten", "sitting")

Indel Similarity

Description

Calculates the insertion/deletion (Indel) similarity between two strings.

Usage

indel_similarity(s1, s2)
indel_similarity(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the Indel similarity.

Examples

indel_similarity("kitten", "sitting")
indel_similarity("kitten", "sitting")

Jaro Distance

Description

Calculates the Jaro distance between two strings, a value between 0 and 1.

Usage

jaro_distance(s1, s2)
jaro_distance(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the Jaro distance.

Examples

jaro_distance("kitten", "sitting")
jaro_distance("kitten", "sitting")

Normalized Jaro Distance

Description

Calculates the normalized Jaro distance between two strings, a value between 0 and 1.

Usage

jaro_normalized_distance(s1, s2)
jaro_normalized_distance(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the normalized Jaro distance.

Examples

jaro_normalized_distance("kitten", "sitting")
jaro_normalized_distance("kitten", "sitting")

Normalized Jaro Similarity

Description

Calculates the normalized Jaro similarity between two strings, a value between 0 and 1.

Usage

jaro_normalized_similarity(s1, s2)
jaro_normalized_similarity(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the normalized Jaro similarity.

Examples

jaro_normalized_similarity("kitten", "sitting")
jaro_normalized_similarity("kitten", "sitting")

Jaro Similarity

Description

Calculates the Jaro similarity between two strings, a value between 0 and 1.

Usage

jaro_similarity(s1, s2)
jaro_similarity(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the Jaro similarity.

Examples

jaro_similarity("kitten", "sitting")
jaro_similarity("kitten", "sitting")

Jaro-Winkler Distance

Description

Calculates the Jaro-Winkler distance between two strings.

Usage

jaro_winkler_distance(s1, s2, prefix_weight = 0.1)
jaro_winkler_distance(s1, s2, prefix_weight = 0.1)

Arguments

`s1`	The first string.
`s2`	The second string.
`prefix_weight`	The weight applied to the prefix (default: 0.1).

Value

A numeric value representing the Jaro-Winkler distance.

Examples

jaro_winkler_distance("kitten", "sitting")
jaro_winkler_distance("kitten", "sitting")

Normalized Jaro-Winkler Distance

Description

Calculates the normalized Jaro-Winkler distance between two strings.

Usage

jaro_winkler_normalized_distance(s1, s2, prefix_weight = 0.1)
jaro_winkler_normalized_distance(s1, s2, prefix_weight = 0.1)

Arguments

`s1`	The first string.
`s2`	The second string.
`prefix_weight`	The weight applied to the prefix (default: 0.1).

Value

A numeric value representing the normalized Jaro-Winkler distance.

Examples

jaro_winkler_normalized_distance("kitten", "sitting")
jaro_winkler_normalized_distance("kitten", "sitting")

Similaridade Normalizada Jaro-Winkler

Description

Calcula a similaridade normalizada Jaro-Winkler entre duas strings.

Usage

jaro_winkler_normalized_similarity(s1, s2, prefix_weight = 0.1)
jaro_winkler_normalized_similarity(s1, s2, prefix_weight = 0.1)

Arguments

`s1`	Primeira string.
`s2`	Segunda string.
`prefix_weight`	Peso do prefixo (valor padrão: 0.1).

Value

Um valor numérico representando a similaridade normalizada Jaro-Winkler.

Examples

jaro_winkler_normalized_similarity("kitten", "sitting")
jaro_winkler_normalized_similarity("kitten", "sitting")

Jaro-Winkler Similarity

Description

Calculates the Jaro-Winkler similarity between two strings.

Usage

jaro_winkler_similarity(s1, s2, prefix_weight = 0.1)
jaro_winkler_similarity(s1, s2, prefix_weight = 0.1)

Arguments

`s1`	The first string.
`s2`	The second string.
`prefix_weight`	The weight applied to the prefix (default: 0.1).

Value

A numeric value representing the Jaro-Winkler similarity.

Examples

jaro_winkler_similarity("kitten", "sitting")
jaro_winkler_similarity("kitten", "sitting")

LCSseq Distance

Description

Calculates the LCSseq (Longest Common Subsequence) distance between two strings.

Usage

lcs_seq_distance(s1, s2, score_cutoff = NULL)
lcs_seq_distance(s1, s2, score_cutoff = NULL)

Arguments

`s1`	The first string.
`s2`	The second string.
`score_cutoff`	Score threshold to stop calculation. Default is the maximum possible value.

Value

A numeric value representing the LCSseq distance.

Examples

lcs_seq_distance("kitten", "sitting")
lcs_seq_distance("kitten", "sitting")

LCSseq Edit Operations

Description

Calculates the edit operations required to transform one string into another.

Usage

lcs_seq_editops(s1, s2)
lcs_seq_editops(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A data.frame containing the edit operations (substitutions, insertions, and deletions).

Examples

lcs_seq_editops("kitten", "sitting")
lcs_seq_editops("kitten", "sitting")

Normalized LCSseq Distance

Description

Calculates the normalized LCSseq distance between two strings.

Usage

lcs_seq_normalized_distance(s1, s2, score_cutoff = 1)
lcs_seq_normalized_distance(s1, s2, score_cutoff = 1)

Arguments

`s1`	The first string.
`s2`	The second string.
`score_cutoff`	Score threshold to stop calculation. Default is 1.0.

Value

A numeric value representing the normalized LCSseq distance.

Examples

lcs_seq_normalized_distance("kitten", "sitting")
lcs_seq_normalized_distance("kitten", "sitting")

Normalized LCSseq Similarity

Description

Calculates the normalized LCSseq similarity between two strings.

Usage

lcs_seq_normalized_similarity(s1, s2, score_cutoff = 0)
lcs_seq_normalized_similarity(s1, s2, score_cutoff = 0)

Arguments

`s1`	The first string.
`s2`	The second string.
`score_cutoff`	Score threshold to stop calculation. Default is 0.0.

Value

A numeric value representing the normalized LCSseq similarity.

Examples

lcs_seq_normalized_similarity("kitten", "sitting")
lcs_seq_normalized_similarity("kitten", "sitting")

LCSseq Similarity

Description

Calculates the LCSseq similarity between two strings.

Usage

lcs_seq_similarity(s1, s2, score_cutoff = 0L)
lcs_seq_similarity(s1, s2, score_cutoff = 0L)

Arguments

`s1`	The first string.
`s2`	The second string.
`score_cutoff`	Score threshold to stop calculation. Default is 0.

Value

A numeric value representing the LCSseq similarity.

Examples

lcs_seq_similarity("kitten", "sitting")
lcs_seq_similarity("kitten", "sitting")

Levenshtein Distance

Description

Calculates the Levenshtein distance between two strings, which represents the minimum number of insertions, deletions, and substitutions required to transform one string into the other.

Usage

levenshtein_distance(s1, s2)
levenshtein_distance(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the Levenshtein distance.

Examples

levenshtein_distance("kitten", "sitting")
levenshtein_distance("kitten", "sitting")

Normalized Levenshtein Distance

Description

The normalized Levenshtein distance is the Levenshtein distance divided by the maximum length of the compared strings, returning a value between 0 and 1.

Usage

levenshtein_normalized_distance(s1, s2)
levenshtein_normalized_distance(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the normalized Levenshtein distance.

Examples

levenshtein_normalized_distance("kitten", "sitting")
levenshtein_normalized_distance("kitten", "sitting")

Normalized Levenshtein Similarity

Description

The normalized Levenshtein similarity returns a value between 0 and 1, indicating how similar the compared strings are.

Usage

levenshtein_normalized_similarity(s1, s2)
levenshtein_normalized_similarity(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the normalized Levenshtein similarity.

Examples

levenshtein_normalized_similarity("kitten", "sitting")
levenshtein_normalized_similarity("kitten", "sitting")

Levenshtein Similarity

Description

Levenshtein similarity measures how similar two strings are, based on the minimum number of operations required to make them identical.

Usage

levenshtein_similarity(s1, s2)
levenshtein_similarity(s1, s2)

Arguments

`s1`	The first string.
`s2`	The second string.

Value

A numeric value representing the Levenshtein similarity.

Examples

levenshtein_similarity("kitten", "sitting")
levenshtein_similarity("kitten", "sitting")

Apply Opcodes to String

Description

Applies opcodes to transform a string.

Usage

opcodes_apply_str(opcodes, s1, s2)
opcodes_apply_str(opcodes, s1, s2)

Arguments

`opcodes`	A data frame of opcode transformations (type, src_begin, src_end, dest_begin, dest_end).
`s1`	The source string.
`s2`	The target string.

Value

The transformed string.

Apply Opcodes to Vector

Description

Applies opcodes to transform a string.

Usage

opcodes_apply_vec(opcodes, s1, s2)
opcodes_apply_vec(opcodes, s1, s2)

Arguments

`opcodes`	A data frame of opcode transformations (type, src_begin, src_end, dest_begin, dest_end).
`s1`	The source string.
`s2`	The target string.

Value

A character vector representing the transformed string.

Distance Using OSA

Description

Calculates the OSA distance between two strings.

Usage

osa_distance(s1, s2, score_cutoff = NULL)
osa_distance(s1, s2, score_cutoff = NULL)

Arguments

`s1`	A string to compare.
`s2`	Another string to compare.
`score_cutoff`	A threshold for the distance score (default is the maximum possible size_t value).

Value

An integer representing the OSA distance.

Examples

osa_distance("string1", "string2")
osa_distance("string1", "string2")

Edit Operations Using OSA

Description

Provides the edit operations required to transform one string into another using the OSA algorithm.

Usage

osa_editops(s1, s2)
osa_editops(s1, s2)

Arguments

`s1`	A string to transform.
`s2`	A target string.

Value

A data frame with the following columns:

operation: The type of operation (delete, insert, replace).
source_position: The position in the source string.
destination_position: The position in the target string.

Examples

osa_editops("string1", "string2")
osa_editops("string1", "string2")

Normalized Distance Using OSA

Description

Calculates the normalized OSA distance between two strings.

Usage

osa_normalized_distance(s1, s2, score_cutoff = 1)
osa_normalized_distance(s1, s2, score_cutoff = 1)

Arguments

`s1`	A string to compare.
`s2`	Another string to compare.
`score_cutoff`	A threshold for the normalized distance score (default is 1.0).

Value

A double representing the normalized distance score.

Examples

osa_normalized_distance("string1", "string2")
osa_normalized_distance("string1", "string2")

Normalized Similarity Using OSA

Description

Calculates the normalized similarity between two strings using the Optimal String Alignment (OSA) algorithm.

Usage

osa_normalized_similarity(s1, s2, score_cutoff = 0)
osa_normalized_similarity(s1, s2, score_cutoff = 0)

Arguments

`s1`	A string to compare.
`s2`	Another string to compare.
`score_cutoff`	A threshold for the normalized similarity score (default is 0.0).

Value

A double representing the normalized similarity score.

Examples

osa_normalized_similarity("string1", "string2")
osa_normalized_similarity("string1", "string2")

Similarity Using OSA

Description

Calculates the OSA similarity between two strings.

Usage

osa_similarity(s1, s2, score_cutoff = 0L)
osa_similarity(s1, s2, score_cutoff = 0L)

Arguments

`s1`	A string to compare.
`s2`	Another string to compare.
`score_cutoff`	A threshold for the similarity score (default is 0).

Value

An integer representing the OSA similarity.

Examples

osa_similarity("string1", "string2")
osa_similarity("string1", "string2")

Postfix Distance

Description

Calculates the distance between the postfixes of two strings.

Usage

postfix_distance(s1, s2, score_cutoff = NULL)
postfix_distance(s1, s2, score_cutoff = NULL)

Arguments

`s1`	A string to compare.
`s2`	Another string to compare.
`score_cutoff`	A threshold for the distance score (default is the maximum possible size_t value).

Value

An integer representing the postfix distance.

Examples

postfix_distance("string1", "string2")
postfix_distance("string1", "string2")

Normalized Postfix Distance

Description

Calculates the normalized distance between the postfixes of two strings.

Usage

postfix_normalized_distance(s1, s2, score_cutoff = 1)
postfix_normalized_distance(s1, s2, score_cutoff = 1)

Arguments

`s1`	A string to compare.
`s2`	Another string to compare.
`score_cutoff`	A threshold for the normalized distance score (default is 1.0).

Value

A double representing the normalized postfix distance.

Examples

postfix_normalized_distance("string1", "string2")
postfix_normalized_distance("string1", "string2")

Normalized Postfix Similarity

Description

Calculates the normalized similarity between the postfixes of two strings.

Usage

postfix_normalized_similarity(s1, s2, score_cutoff = 0)
postfix_normalized_similarity(s1, s2, score_cutoff = 0)

Arguments

`s1`	A string to compare.
`s2`	Another string to compare.
`score_cutoff`	A threshold for the normalized similarity score (default is 0.0).

Value

A double representing the normalized postfix similarity.

Examples

postfix_normalized_similarity("string1", "string2")
postfix_normalized_similarity("string1", "string2")

Postfix Similarity

Description

Calculates the similarity between the postfixes of two strings.

Usage

postfix_similarity(s1, s2, score_cutoff = 0L)
postfix_similarity(s1, s2, score_cutoff = 0L)

Arguments

`s1`	A string to compare.
`s2`	Another string to compare.
`score_cutoff`	A threshold for the similarity score (default is 0).

Value

An integer representing the postfix similarity.

Examples

postfix_similarity("string1", "string2")
postfix_similarity("string1", "string2")

Calculate the prefix distance between two strings

Description

Computes the prefix distance, which measures the number of character edits required to convert one prefix into another. This includes insertions, deletions, and substitutions.

Usage

prefix_distance(s1, s2, score_cutoff = NULL)
prefix_distance(s1, s2, score_cutoff = NULL)

Arguments

`s1`	A string. The first input string.
`s2`	A string. The second input string.
`score_cutoff`	An optional maximum threshold for the distance. Defaults to the largest integer value in R ('.Machine$integer.max').

Value

The prefix distance as an integer.

Examples

prefix_distance("abcdef", "abcxyz")
prefix_distance("abcdef", "abcxyz", score_cutoff = 3)
prefix_distance("abcdef", "abcxyz")
prefix_distance("abcdef", "abcxyz", score_cutoff = 3)

Calculate the normalized prefix distance between two strings

Description

Computes the normalized distance of the prefixes of two strings, where the result is between 0.0 (identical) and 1.0 (completely different).

Usage

prefix_normalized_distance(s1, s2, score_cutoff = 1)
prefix_normalized_distance(s1, s2, score_cutoff = 1)

Arguments

`s1`	A string. The first input string.
`s2`	A string. The second input string.
`score_cutoff`	An optional maximum threshold for the normalized distance. Defaults to 1.0.

Value

The normalized prefix distance as a double.

Examples

prefix_normalized_distance("abcdef", "abcxyz")
prefix_normalized_distance("abcdef", "abcxyz", score_cutoff = 0.5)
prefix_normalized_distance("abcdef", "abcxyz")
prefix_normalized_distance("abcdef", "abcxyz", score_cutoff = 0.5)

Calculate the normalized prefix similarity between two strings

Description

Computes the normalized similarity of the prefixes of two strings, where the result is between 0.0 (completely different) and 1.0 (identical).

Usage

prefix_normalized_similarity(s1, s2, score_cutoff = 0)
prefix_normalized_similarity(s1, s2, score_cutoff = 0)

Arguments

`s1`	A string. The first input string.
`s2`	A string. The second input string.
`score_cutoff`	An optional minimum threshold for the normalized similarity. Defaults to 0.0.

Value

The normalized prefix similarity as a double.

Examples

prefix_normalized_similarity("abcdef", "abcxyz")
prefix_normalized_similarity("abcdef", "abcxyz", score_cutoff = 0.7)
prefix_normalized_similarity("abcdef", "abcxyz")
prefix_normalized_similarity("abcdef", "abcxyz", score_cutoff = 0.7)

Calculate the prefix similarity between two strings

Description

Computes the similarity of the prefixes of two strings based on their number of matching characters.

Usage

prefix_similarity(s1, s2, score_cutoff = 0L)
prefix_similarity(s1, s2, score_cutoff = 0L)

Arguments

`s1`	A string. The first input string.
`s2`	A string. The second input string.
`score_cutoff`	An optional minimum threshold for the similarity score. Defaults to 0.

Value

The prefix similarity as an integer.

Examples

prefix_similarity("abcdef", "abcxyz")
prefix_similarity("abcdef", "abcxyz", score_cutoff = 3)
prefix_similarity("abcdef", "abcxyz")
prefix_similarity("abcdef", "abcxyz", score_cutoff = 3)

Process a String

Description

Processes a given input string by applying optional trimming, case conversion, and ASCII transliteration.

Usage

processString(input, processor = TRUE, asciify = FALSE)
processString(input, processor = TRUE, asciify = FALSE)

Arguments

`input`	A `std::string` representing the input string to be processed.
`processor`	A `bool` indicating whether to trim whitespace and convert the string to lowercase. Default is `true`.
`asciify`	A `bool` indicating whether to transliterate non-ASCII characters to their closest ASCII equivalents. Default is `false`.

Details

The function applies the following transformations to the input string, in this order:

Trimming (if processor = TRUE): Removes leading and trailing whitespace.
Lowercasing (if processor = TRUE): Converts all characters to lowercase.
ASCII Transliteration (if asciify = TRUE): Replaces accented or special characters with their closest ASCII equivalents.

Value

A std::string representing the processed string.

Examples

# Example usage
processString("  Éxâmple!  ", processor = TRUE, asciify = TRUE)
# Returns: "example!"

processString("  Éxâmple!  ", processor = TRUE, asciify = FALSE)
# Returns: "éxâmple!"

processString("  Éxâmple!  ", processor = FALSE, asciify = TRUE)
# Returns: "Éxâmple!"

processString("  Éxâmple!  ", processor = FALSE, asciify = FALSE)
# Returns: "  Éxâmple!  "
# Example usage
processString("  Éxâmple!  ", processor = TRUE, asciify = TRUE)
# Returns: "example!"

processString("  Éxâmple!  ", processor = TRUE, asciify = FALSE)
# Returns: "éxâmple!"

processString("  Éxâmple!  ", processor = FALSE, asciify = TRUE)
# Returns: "Éxâmple!"

processString("  Éxâmple!  ", processor = FALSE, asciify = FALSE)
# Returns: "  Éxâmple!  "

Package 'RapidFuzz'

Help Index

Damerau-Levenshtein Distance

Description

Usage

Arguments

Value

Examples

Normalized Damerau-Levenshtein Distance

Description

Usage

Arguments

Value

Examples

Normalized Damerau-Levenshtein Similarity

Description

Usage

Arguments

Value

Examples

Damerau-Levenshtein Similarity

Description

Usage

Arguments

Value

Examples

Apply Edit Operations to String

Description

Usage

Arguments

Value

Apply Edit Operations to Vector

Description

Usage

Arguments

Value

Extract Best Match

Description

Usage

Arguments

Value

Extract Matches with Scoring and Limit

Description

Usage

Arguments

Value

Extract Matches

Description

Usage

Arguments

Value

Partial Ratio Calculation

Description

Usage

Arguments

Value

Examples

Quick Ratio Calculation

Description

Usage

Arguments

Value

Examples

Simple Ratio Calculation

Description

Usage

Arguments

Value

Examples

Combined Token Ratio

Description

Usage

Arguments

Value

Examples

Token Set Ratio Calculation

Description

Usage

Arguments

Value