Module util

This module contains miscellaneous helper functions for the KOReader frontend.

Functions

stripPunctuation (text) Strips all punctuation marks and spaces from a string.
rtrim (s) Remove trailing whitespace from string.
trim (s) Remove leading & trailing whitespace from string.
cleanupSelectedText (text) Variant tailored for text selection purposes (originally implemented in ReaderHighlight).
gsplit (str, pattern, capture, capture_empty_entity) Splits a string by a pattern

Lua doesn't have a string.split() function and most of the time you don't really need it because string.gmatch() is enough.

tableEquals (o1, o2, ignore_mt) Compares values in two different tables.
tableDeepCopy (o) Makes a deep copy of a table.
tableSize (t) Returns number of keys in a table.
tableGetValue (t, ...) Returns a value of a key, checks if all parent keys are not empty.
tableSetValue (t, value, ...) Sets a value of a key, creates all parent keys if needed.
tableRemoveValue (t, ...) Removes a key in a table, removes all empty parent keys.
arrayAppend (t1, t2) Append all elements from t2 into t1.
arrayReverse (t) Reverse array elements in-place in table t
arrayContains (t, v, callback) Test whether t contains a value equal to v (or such a value that callback returns true), and if so, return the index.
arrayReferences (t, n, m) Test whether array t contains a reference to array n (at any depth at or below m)
bsearch_left (array, value) Perform a leftmost insertion binary search for value in a sorted (ascending) array.
bsearch_right (array, value) Perform a rightmost insertion binary search for value in a sorted (ascending) array.
lastIndexOf (string, ch) Gets last index of character in string (i.e., strrchr)

Returns the index within this string of the last occurrence of the specified character or -1 if the character does not occur.

utf8Reverse (string) Reverse the individual greater-than-single-byte characters
splitToChars (text) Splits string into a list of UTF-8 characters.
isCJKChar (c) Tests whether c is a CJK character
hasCJKChar (str) Tests whether str contains CJK characters
splitToWords (text) Split texts into a list of words, spaces and punctuation marks.
isSplittable (c, next_c, prev_c) Test whether a string can be separated by this char for multi-line rendering.
getFilesystemType (path) Gets filesystem type of a path.
calcFreeMem () Computes the currently available memory
findFiles (path, callback) Recursively scan directory for files inside
isEmptyDir (path) Checks if directory is empty.
fileExists (path) check if the given path is a file
pathExists (path) Checks if the given path exists.
directoryExists (path) Checks if the given directory exists.
makePath (path) As mkdir -p.
removePath (path) Remove as many of the empty directories specified in path, children-first.
removeFile (path) As rm
getSafeFilename (str, path, limit) Replaces characters that are invalid in filenames.
splitFilePathName (file) Splits a file into its directory path and file name.
splitFileNameSuffix (file) Splits a file name into its pure file name and suffix
getFileNameSuffix (filename) Gets file extension
getScriptType (filename) Companion helper function that returns the script's language, based on the file extension.
getFriendlySize (size, right_align) Gets human friendly size as string
getFormattedSize (size) Gets formatted size as string (1273334 => "1,273,334")
partialMD5 (filepath) Calculate partial digest of an open file.
fixUtf8 (str, replacement) Replaces invalid UTF-8 characters with a replacement string.
splitToArray (str, splitter, capture_empty_entity) Splits input string with the splitter into a table.
unicodeCodepointToUtf8 (c) Convert a Unicode codepoint (number) to UTF-8 char c.f., https://stackoverflow.com/a/4609989

 & <https://stackoverflow.com/a/38492214>

See utf8charcode in ffi/util for a decoder.

htmlEntitiesToUtf8 (string) Replace HTML entities with their UTF-8 encoded equivalent in text.
htmlToPlainText (text) Convert simple HTML to plain text.
htmlToPlainTextIfHtml (text) Convert HTML to plain text if text seems to be HTML Detection of HTML is simple and may raise false positives or negatives, but seems quite good at guessing content type of text found in EPUB's .
htmlEscape (text) Encode the HTML entities in a string
prettifyCSS (CSS) Prettify a CSS stylesheet Not perfect, but enough to make some ugly CSS readable.
urlEncode (text) Encode URL also known as percent-encoding see https://en.wikipedia.org/wiki/Percent-encoding
urlDecode (text) Decode URL (reverse process to util.urlEncode())
checkLuaSyntax (text) Check lua syntax of string
stringStartsWith (str, start) Simple startsWith string helper.
stringEndsWith (str, ending) Simple endsWith string helper.
stringSearch (or, str, start_pos) Search a string in a text.
wrapMethod (target_table, target_field_name, new_func, before_callback) Wrap (or replace) a table method with a custom method, in a revertable way.

Tables

t Remove elements from an array, fast.
args Escape list for shell usage
t Clear all the elements from an array without reassignment.

Fields

UTF8_CHAR_PATTERN Pattern which matches a single well-formed UTF-8 character, including theoretical >4-byte extensions.


Functions

stripPunctuation (text)
Strips all punctuation marks and spaces from a string.

Parameters:

  • text string the string to be stripped

Returns:

    string stripped text
rtrim (s)
Remove trailing whitespace from string.

Parameters:

  • s string the string to be trimmed

Returns:

    string trimmed text
trim (s)
Remove leading & trailing whitespace from string.

Parameters:

  • s string the string to be trimmed

Returns:

    string trimmed text
cleanupSelectedText (text)
Variant tailored for text selection purposes (originally implemented in ReaderHighlight).

Parameters:

  • text string the text to be trimmed

Returns:

    string trimmed text
gsplit (str, pattern, capture, capture_empty_entity)
Splits a string by a pattern

Lua doesn't have a string.split() function and most of the time you don't really need it because string.gmatch() is enough. However string.gmatch() has one significant disadvantage for me: You can't split a string while matching both the delimited strings and the delimiters themselves without tracking positions and substrings. The gsplit function below takes care of this problem.

Author: Peter Odding

License: MIT/X11

Source: http://snippets.luacode.org/snippets/Stringsplitting130

Parameters:

  • str string string to split
  • pattern the pattern to split against
  • capture boolean
  • capture_empty_entity boolean
tableEquals (o1, o2, ignore_mt)
Compares values in two different tables.

Source: https://stackoverflow.com/a/32660766/2470572

Parameters:

  • o1 Lua table
  • o2 Lua table
  • ignore_mt boolean

Returns:

    boolean
tableDeepCopy (o)
Makes a deep copy of a table.

Source: https://stackoverflow.com/a/16077650/2470572

Parameters:

  • o Lua table

Returns:

    Lua table
tableSize (t)
Returns number of keys in a table.

Parameters:

  • t Lua table

Returns:

    int number of keys in table t
tableGetValue (t, ...)
Returns a value of a key, checks if all parent keys are not empty.

Parameters:

  • t Lua table
  • ... parent keys, starting from the upper level

Returns:

    value of the last key or nil
tableSetValue (t, value, ...)
Sets a value of a key, creates all parent keys if needed.

Parameters:

  • t Lua table
  • value value to be assigned to the last key
  • ... parent keys, starting from the upper level
tableRemoveValue (t, ...)
Removes a key in a table, removes all empty parent keys.

Parameters:

  • t Lua table
  • ... parent keys, starting from the upper level
arrayAppend (t1, t2)
Append all elements from t2 into t1.

Parameters:

  • t1 Lua table
  • t2 Lua table
arrayReverse (t)
Reverse array elements in-place in table t

Parameters:

  • t Lua table
arrayContains (t, v, callback)
Test whether t contains a value equal to v (or such a value that callback returns true), and if so, return the index.

Parameters:

  • t Lua table
  • v
  • callback function (v1, v2)
arrayReferences (t, n, m)
Test whether array t contains a reference to array n (at any depth at or below m)

Parameters:

  • t Lua table (array only)
  • n Lua table (array only)
  • m integer Max nesting level
bsearch_left (array, value)
Perform a leftmost insertion binary search for value in a sorted (ascending) array.

Parameters:

  • array Lua table (array only, sorted, ascending, every value must match the type of value and support comparison operators)
  • value

Returns:

    int leftmost insertion index of value in array.
bsearch_right (array, value)
Perform a rightmost insertion binary search for value in a sorted (ascending) array.

Parameters:

  • array Lua table (array only, sorted, ascending, every value must match the type of value and support comparison operators)
  • value

Returns:

    int rightmost insertion index of value in array.
lastIndexOf (string, ch)
Gets last index of character in string (i.e., strrchr)

Returns the index within this string of the last occurrence of the specified character or -1 if the character does not occur.

To find . you need to escape it.

Parameters:

Returns:

    int last occurrence or -1 if not found
utf8Reverse (string)
Reverse the individual greater-than-single-byte characters

Parameters:

splitToChars (text)
Splits string into a list of UTF-8 characters.

Parameters:

  • text string the string to be split.

Returns:

    table list of UTF-8 chars
isCJKChar (c)
Tests whether c is a CJK character

Parameters:

Returns:

    boolean true if CJK
hasCJKChar (str)
Tests whether str contains CJK characters

Parameters:

Returns:

    boolean true if CJK
splitToWords (text)
Split texts into a list of words, spaces and punctuation marks.

Parameters:

Returns:

    table list of words, spaces and punctuation marks
isSplittable (c, next_c, prev_c)
Test whether a string can be separated by this char for multi-line rendering. Optional next or prev chars may be provided to help make the decision

Parameters:

Returns:

    boolean true if splittable, false if not
getFilesystemType (path)
Gets filesystem type of a path.

Checks if the path occurs in /proc/mounts

Parameters:

  • path string an absolute path

Returns:

    string filesystem type
calcFreeMem ()
Computes the currently available memory

Returns:

    tuple of ints: memavailable, memtotal (or nil, nil on unsupported platforms).
findFiles (path, callback)
Recursively scan directory for files inside

Parameters:

  • path string
  • callback function (fullpath, name, attr)
isEmptyDir (path)
Checks if directory is empty.

Parameters:

Returns:

    bool
fileExists (path)
check if the given path is a file

Parameters:

Returns:

    bool
pathExists (path)
Checks if the given path exists. Doesn't care if it's a file or directory.

Parameters:

Returns:

    bool
directoryExists (path)
Checks if the given directory exists.

Parameters:

  • path
makePath (path)
As mkdir -p. Unlike lfs.mkdir(), does not error if the directory already exists, and creates intermediate directories as needed.

Parameters:

  • path string the directory to create

Returns:

    bool true on success; nil, err_message on error
removePath (path)
Remove as many of the empty directories specified in path, children-first. Does not fail if the directory is already gone.

Parameters:

  • path string the directory tree to prune

Returns:

    bool true on success; nil, err_message on error
removeFile (path)
As rm

Parameters:

  • path string of the file to remove

Returns:

    bool true on success; nil, err_message on error
getSafeFilename (str, path, limit)
Replaces characters that are invalid in filenames.

Replaces the characters \/:*?"<>| with an _ unless an optional path is provided. These characters are problematic on Windows filesystems. On Linux only the / poses a problem.

If an optional path is provided, util.getFilesystemType() will be used to determine whether stricter VFAT restrictions should be applied.

Parameters:

Returns:

    string safe filename
splitFilePathName (file)
Splits a file into its directory path and file name. If the given path has a trailing /, returns the entire path as the directory path and "" as the file name.

Parameters:

Returns:

    string directory, filename
splitFileNameSuffix (file)
Splits a file name into its pure file name and suffix

Parameters:

Returns:

    string path, extension
getFileNameSuffix (filename)
Gets file extension

Parameters:

Returns:

    string extension
getScriptType (filename)
Companion helper function that returns the script's language, based on the file extension.

Parameters:

Returns:

    string (lowercase) (or nil if not Device:canExecuteScript(file))
getFriendlySize (size, right_align)
Gets human friendly size as string

Parameters:

  • size integer (bytes)
  • right_align boolean (by padding with spaces on the left)

Returns:

    string
getFormattedSize (size)
Gets formatted size as string (1273334 => "1,273,334")

Parameters:

  • size integer (bytes)

Returns:

    string
partialMD5 (filepath)
Calculate partial digest of an open file. To the calculating mechanism itself, since only PDF documents could be modified by KOReader by appending data at the end of the files when highlighting, we use a non-even sampling algorithm which samples with larger weight at file head and much smaller weight at file tail, thus reduces the probability that appended data may change the digest value. Note that if PDF file size is around 1024, 4096, 16384, 65536, 262144 1048576, 4194304, 16777216, 67108864, 268435456 or 1073741824, appending data by highlighting in KOReader may change the digest value.

Parameters:

  • filepath
fixUtf8 (str, replacement)
Replaces invalid UTF-8 characters with a replacement string.

Based on http://notebook.kulchenko.com/programming/fixing-malformed-utf8-in-lua. c.f., FixUTF8 @ https://github.com/pkulchenko/ZeroBraneStudio/blob/master/src/util.lua.

Parameters:

  • str string the string to be checked for invalid characters
  • replacement string the string to replace invalid characters with

Returns:

    string valid UTF-8
splitToArray (str, splitter, capture_empty_entity)
Splits input string with the splitter into a table. This function ignores the last empty entity.

Parameters:

  • str string the string to be split
  • splitter string
  • capture_empty_entity boolean

Returns:

    an array-like table
unicodeCodepointToUtf8 (c)
Convert a Unicode codepoint (number) to UTF-8 char c.f., https://stackoverflow.com/a/4609989

 & <https://stackoverflow.com/a/38492214>

See utf8charcode in ffi/util for a decoder.

Parameters:

  • c integer Unicode codepoint

Returns:

    string UTF-8 char
htmlEntitiesToUtf8 (string)
Replace HTML entities with their UTF-8 encoded equivalent in text.

Supports only basic ones and those with numbers (no support for named entities like &eacute;).

Parameters:

  • string integer text with HTML entities

Returns:

    string UTF-8 text
htmlToPlainText (text)
Convert simple HTML to plain text.

This may fail on complex HTML (with styles, scripts, comments), but should be fine enough with simple HTML as found in EPUB's <dc:description>.

Parameters:

Returns:

    string plain text
htmlToPlainTextIfHtml (text)
Convert HTML to plain text if text seems to be HTML Detection of HTML is simple and may raise false positives or negatives, but seems quite good at guessing content type of text found in EPUB's .

Parameters:

  • text string the string with possibly some HTML

Returns:

    string cleaned text
htmlEscape (text)
Encode the HTML entities in a string

Parameters:

  • text string the string to escape Taken from https://github.com/kernelsauce/turbo/blob/e4a35c2e3fb63f07464f8f8e17252bea3a029685/turbo/escape.lua#L58-L70
prettifyCSS (CSS)
Prettify a CSS stylesheet Not perfect, but enough to make some ugly CSS readable. By default, each selector and each property is put on its own line. With condensed=true, condense each full declaration on a single line.

Parameters:

Returns:

    string the CSS prettified
urlEncode (text)
Encode URL also known as percent-encoding see https://en.wikipedia.org/wiki/Percent-encoding

Parameters:

  • text string the string to encode

Returns:

    encode string Taken from https://gist.github.com/liukun/f9ce7d6d14fa45fe9b924a3eed5c3d99
urlDecode (text)
Decode URL (reverse process to util.urlEncode())

Parameters:

  • text string the string to decode

Returns:

    decode string Taken from https://gist.github.com/liukun/f9ce7d6d14fa45fe9b924a3eed5c3d99
checkLuaSyntax (text)
Check lua syntax of string

Parameters:

Returns:

    string with parsing error, nil if syntax ok
stringStartsWith (str, start)
Simple startsWith string helper.

C.f., http://lua-users.org/wiki/StringRecipes.

Parameters:

Returns:

    bool true on success
stringEndsWith (str, ending)
Simple endsWith string helper.

Parameters:

Returns:

    bool true on success
stringSearch (or, str, start_pos)
Search a string in a text.

Parameters:

  • or string table txt Text (char list) to search in
  • str string String to search for
  • start_pos number Position number in text to start search from

Returns:

  1. number Position number or 0 if not found
  2. table Text char list
  3. table Search string char list
wrapMethod (target_table, target_field_name, new_func, before_callback)
Wrap (or replace) a table method with a custom method, in a revertable way. This allows you extend the features of an existing module by modifying its internal methods, and then revert them back to normal later if necessary.

The most notable use-case for this is VirtualKeyboard's inputbox method wrapping to allow keyboards to add more complicated state-machines to modify how characters are input.

The returned table is the same table target_table[target_field_name] is set to. In addition to being callable, the new method has two sub-methods:

  • :revert() will un-wrap the method and revert it to the original state.

    Note that if a method is wrapped multiple times, reverting it will revert it to the state of the method when util.wrapMethod was called (and if called on the table returned from util.wrapMethod, that is the state when that particular util.wrapMethod was called).

  • :raw_call(...) will call the original method with the given arguments and return whatever it returns.

    This makes it more ergonomic to use the wrapped table methods in the case where you've replaced the regular function with your own implementation but you need to call the original functions inside your implementation.

  • :raw_method_call(...) will call the original method with the arguments (target_table, ...) and return whatever it returns. Note that the target_table used is the one associated with the util.wrapMethod call.

    This makes it more ergonomic to use the wrapped table methods in the case where you've replaced the regular function with your own implementation but you need to call the original functions inside your implementation.

    This is effectively short-hand for :raw_call(target_table, ...).

This is loosely based on busted/luassert's spies implementation (MIT). https://github.com/Olivine-Labs/luassert/blob/v1.7.11/src/spy.lua

Parameters:

  • target_table table The table whose method will be wrapped.
  • target_field_name string The name of the field to wrap.
  • new_func nil or func If non-nil, this function will be called instead of the original function after wrapping.
  • before_callback nil or func If non-nil, this function will be called (with the arguments (target_table, ...)) before the function is called.

Tables

t
Remove elements from an array, fast.

Swap & pop, like http://lua-users.org/lists/lua-l/2013-11/msg00027.html / https://stackoverflow.com/a/28942022, but preserving order. c.f., https://stackoverflow.com/a/53038524

Fields:

  • keep_cb function Filtering callback. Takes three arguments: table, index, new index. Returns true to keep the item. See link above for potential uses of the third argument.

Usage:

    local foo = { "a", "b", "c", "b", "d", "e" }
    local function drop_b(t, i, j)
        -- Discard any item with value "b"
        return t[i] ~= "b"
    end
    util.arrayRemove(foo, drop_b)
args
Escape list for shell usage
t
Clear all the elements from an array without reassignment.

Fields

UTF8_CHAR_PATTERN
Pattern which matches a single well-formed UTF-8 character, including theoretical >4-byte extensions. Taken from https://www.lua.org/manual/5.4/manual.html#pdf-utf8.charpattern
generated by LDoc 1.5.0 Last updated 2025-01-24 21:45:56