Module `util`

This module contains miscellaneous helper functions for the KOReader frontend.

Functions

stripPunctuation (text)	Strips all punctuation marks and spaces from a string.
rtrim (s)	Remove trailing whitespace from string.
trim (s)	Remove leading & trailing whitespace from string.
cleanupSelectedText (text)	Variant tailored for text selection purposes (originally implemented in ReaderHighlight).
gsplit (str, pattern, capture, capture_empty_entity)	Splits a string by a pattern Lua doesn't have a string.split() function and most of the time you don't really need it because string.gmatch() is enough.
tableEquals (o1, o2, ignore_mt)	Compares values in two different tables.
tableDeepCopy (o)	Makes a deep copy of a table.
tableSize (t)	Returns number of keys in a table.
tableGetValue (t, ...)	Returns a value of a key, checks if all parent keys are not empty.
tableSetValue (t, value, ...)	Sets a value of a key, creates all parent keys if needed.
tableRemoveValue (t, ...)	Removes a key in a table, removes all empty parent keys.
arrayAppend (t1, t2)	Append all elements from t2 into t1.
arrayReverse (t)	Reverse array elements in-place in table t
arrayContains (t, v, callback)	Test whether t contains a value equal to v (or such a value that callback returns true), and if so, return the index.
arrayReferences (t, n, m)	Test whether array t contains a reference to array n (at any depth at or below m)
bsearch_left (array, value)	Perform a leftmost insertion binary search for `value` in a sorted (ascending) `array`.
bsearch_right (array, value)	Perform a rightmost insertion binary search for `value` in a sorted (ascending) `array`.
lastIndexOf (string, ch)	Gets last index of character in string (i.e., strrchr) Returns the index within this string of the last occurrence of the specified character or -1 if the character does not occur.
utf8Reverse (string)	Reverse the individual greater-than-single-byte characters
splitToChars (text)	Splits string into a list of UTF-8 characters.
isCJKChar (c)	Tests whether c is a CJK character
hasCJKChar (str)	Tests whether str contains CJK characters
splitToWords (text)	Split texts into a list of words, spaces and punctuation marks.
isSplittable (c, next_c, prev_c)	Test whether a string can be separated by this char for multi-line rendering.
getFilesystemType (path)	Gets filesystem type of a path.
calcFreeMem ()	Computes the currently available memory
findFiles (path, callback)	Recursively scan directory for files inside
isEmptyDir (path)	Checks if directory is empty.
fileExists (path)	check if the given path is a file
pathExists (path)	Checks if the given path exists.
directoryExists (path)	Checks if the given directory exists.
makePath (path)	As `mkdir -p`.
removePath (path)	Remove as many of the empty directories specified in path, children-first.
removeFile (path)	As `rm`
getSafeFilename (str, path, limit)	Replaces characters that are invalid in filenames.
splitFilePathName (file)	Splits a file into its directory path and file name.
splitFileNameSuffix (file)	Splits a file name into its pure file name and suffix
getFileNameSuffix (filename)	Gets file extension
getScriptType (filename)	Companion helper function that returns the script's language, based on the file extension.
getFriendlySize (size, right_align)	Gets human friendly size as string
getFormattedSize (size)	Gets formatted size as string (1273334 => "1,273,334")
partialMD5 (filepath)	Calculate partial digest of an open file.
fixUtf8 (str, replacement)	Replaces invalid UTF-8 characters with a replacement string.
splitToArray (str, splitter, capture_empty_entity)	Splits input string with the splitter into a table.
unicodeCodepointToUtf8 (c)	Convert a Unicode codepoint (number) to UTF-8 char c.f., https://stackoverflow.com/a/4609989 `& <https://stackoverflow.com/a/38492214>` See utf8charcode in ffi/util for a decoder.
htmlEntitiesToUtf8 (string)	Replace HTML entities with their UTF-8 encoded equivalent in text.
htmlToPlainText (text)	Convert simple HTML to plain text.
htmlToPlainTextIfHtml (text)	Convert HTML to plain text if text seems to be HTML Detection of HTML is simple and may raise false positives or negatives, but seems quite good at guessing content type of text found in EPUB's .
htmlEscape (text)	Encode the HTML entities in a string
prettifyCSS (CSS)	Prettify a CSS stylesheet Not perfect, but enough to make some ugly CSS readable.
urlEncode (text)	Encode URL also known as percent-encoding see https://en.wikipedia.org/wiki/Percent-encoding
urlDecode (text)	Decode URL (reverse process to util.urlEncode())
checkLuaSyntax (text)	Check lua syntax of string
stringStartsWith (str, start)	Simple startsWith string helper.
stringEndsWith (str, ending)	Simple endsWith string helper.
stringSearch (or, str, start_pos)	Search a string in a text.
wrapMethod (target_table, target_field_name, new_func, before_callback)	Wrap (or replace) a table method with a custom method, in a revertable way.

Tables

t	Remove elements from an array, fast.
args	Escape list for shell usage
t	Clear all the elements from an array without reassignment.

Fields

UTF8_CHAR_PATTERN

Pattern which matches a single well-formed UTF-8 character, including theoretical >4-byte extensions.

Functions

stripPunctuation (text)

Strips all punctuation marks and spaces from a string.

Parameters:

text string the string to be stripped

Returns:

string

rtrim (s)

Remove trailing whitespace from string.

Parameters:

s string the string to be trimmed

Returns:

string

trim (s)

Remove leading & trailing whitespace from string.

Parameters:

s string the string to be trimmed

Returns:

string

cleanupSelectedText (text)

Variant tailored for text selection purposes (originally implemented in ReaderHighlight).

Parameters:

text string the text to be trimmed

Returns:

string

gsplit (str, pattern, capture, capture_empty_entity)

Splits a string by a pattern

Lua doesn't have a string.split() function and most of the time you don't really need it because string.gmatch() is enough. However string.gmatch() has one significant disadvantage for me: You can't split a string while matching both the delimited strings and the delimiters themselves without tracking positions and substrings. The gsplit function below takes care of this problem.

Author: Peter Odding

License: MIT/X11

Source: http://snippets.luacode.org/snippets/Stringsplitting130

Parameters:

str string string to split
pattern the pattern to split against
capture boolean
capture_empty_entity boolean

tableEquals (o1, o2, ignore_mt)

Compares values in two different tables.

Source: https://stackoverflow.com/a/32660766/2470572

Parameters:

o1 Lua table
o2 Lua table
ignore_mt boolean

Returns:

boolean

tableDeepCopy (o)

Makes a deep copy of a table.

Source: https://stackoverflow.com/a/16077650/2470572

Parameters:

o Lua table

Returns:

Lua

tableSize (t)

Returns number of keys in a table.

Parameters:

t Lua table

Returns:

int

tableGetValue (t, ...)

Returns a value of a key, checks if all parent keys are not empty.

Parameters:

t Lua table
... parent keys, starting from the upper level

Returns:

value

tableSetValue (t, value, ...)

Sets a value of a key, creates all parent keys if needed.

Parameters:

t Lua table
value value to be assigned to the last key
... parent keys, starting from the upper level

tableRemoveValue (t, ...)

Removes a key in a table, removes all empty parent keys.

Parameters:

t Lua table
... parent keys, starting from the upper level

arrayAppend (t1, t2)

Append all elements from t2 into t1.

Parameters:

t1 Lua table
t2 Lua table

arrayReverse (t)

Reverse array elements in-place in table t

Parameters:

t Lua table

arrayContains (t, v, callback)

Test whether t contains a value equal to v (or such a value that callback returns true), and if so, return the index.

Parameters:

t Lua table
v
callback function (v1, v2)

arrayReferences (t, n, m)

Test whether array t contains a reference to array n (at any depth at or below m)

Parameters:

t Lua table (array only)
n Lua table (array only)
m integer Max nesting level

bsearch_left (array, value)

Perform a leftmost insertion binary search for value in a sorted (ascending) array.

Parameters:

array Lua table (array only, sorted, ascending, every value must match the type of value and support comparison operators)
value

Returns:

int leftmost insertion index of value in array.

bsearch_right (array, value)

Perform a rightmost insertion binary search for value in a sorted (ascending) array.

Parameters:

array Lua table (array only, sorted, ascending, every value must match the type of value and support comparison operators)
value

Returns:

int rightmost insertion index of value in array.

lastIndexOf (string, ch)

Gets last index of character in string (i.e., strrchr)

Returns the index within this string of the last occurrence of the specified character or -1 if the character does not occur.

To find . you need to escape it.

Parameters:

string string
ch string

Returns:

int

utf8Reverse (string)

Reverse the individual greater-than-single-byte characters

Parameters:

string string to reverse Taken from https://github.com/blitmap/lua-utf8-simple#utf8reverses

splitToChars (text)

Splits string into a list of UTF-8 characters.

Parameters:

text string the string to be split.

Returns:

table

isCJKChar (c)

Tests whether c is a CJK character

Parameters:

c string

Returns:

boolean

hasCJKChar (str)

Tests whether str contains CJK characters

Parameters:

str string

Returns:

boolean

splitToWords (text)

Split texts into a list of words, spaces and punctuation marks.

Parameters:

text string text to split

Returns:

table

isSplittable (c, next_c, prev_c)

Test whether a string can be separated by this char for multi-line rendering. Optional next or prev chars may be provided to help make the decision

Parameters:

c string
next_c string
prev_c string

Returns:

boolean

getFilesystemType (path)

Gets filesystem type of a path.

Checks if the path occurs in /proc/mounts

Parameters:

path string an absolute path

Returns:

string

calcFreeMem ()

Computes the currently available memory

Returns:

tuple

findFiles (path, callback)

Recursively scan directory for files inside

Parameters:

path string
callback function (fullpath, name, attr)

isEmptyDir (path)

Checks if directory is empty.

Parameters:

path string

Returns:

bool

fileExists (path)

check if the given path is a file

Parameters:

path string

Returns:

bool

pathExists (path)

Checks if the given path exists. Doesn't care if it's a file or directory.

Parameters:

path string

Returns:

bool

directoryExists (path)

Checks if the given directory exists.

Parameters:

path

makePath (path)

As mkdir -p. Unlike lfs.mkdir(), does not error if the directory already exists, and creates intermediate directories as needed.

Parameters:

path string the directory to create

Returns:

bool

removePath (path)

Remove as many of the empty directories specified in path, children-first. Does not fail if the directory is already gone.

Parameters:

path string the directory tree to prune

Returns:

bool

removeFile (path)

As rm

Parameters:

path string of the file to remove

Returns:

bool

getSafeFilename (str, path, limit)

Replaces characters that are invalid in filenames.

Replaces the characters \/:*?"<>| with an _ unless an optional path is provided. These characters are problematic on Windows filesystems. On Linux only the / poses a problem.

If an optional path is provided, util.getFilesystemType() will be used to determine whether stricter VFAT restrictions should be applied.

Parameters:

str string
path string
limit integer

Returns:

string

splitFilePathName (file)

Splits a file into its directory path and file name. If the given path has a trailing /, returns the entire path as the directory path and "" as the file name.

Parameters:

file string

Returns:

string

splitFileNameSuffix (file)

Splits a file name into its pure file name and suffix

Parameters:

file string

Returns:

string

getFileNameSuffix (filename)

Gets file extension

Parameters:

filename string

Returns:

string

getScriptType (filename)

Companion helper function that returns the script's language, based on the file extension.

Parameters:

filename string

Returns:

string

getFriendlySize (size, right_align)

Gets human friendly size as string

Parameters:

size integer (bytes)
right_align boolean (by padding with spaces on the left)

Returns:

string

getFormattedSize (size)

Gets formatted size as string (1273334 => "1,273,334")

Parameters:

size integer (bytes)

Returns:

string

partialMD5 (filepath)

Calculate partial digest of an open file. To the calculating mechanism itself, since only PDF documents could be modified by KOReader by appending data at the end of the files when highlighting, we use a non-even sampling algorithm which samples with larger weight at file head and much smaller weight at file tail, thus reduces the probability that appended data may change the digest value. Note that if PDF file size is around 1024, 4096, 16384, 65536, 262144 1048576, 4194304, 16777216, 67108864, 268435456 or 1073741824, appending data by highlighting in KOReader may change the digest value.

Parameters:

filepath

fixUtf8 (str, replacement)

Replaces invalid UTF-8 characters with a replacement string.

Based on http://notebook.kulchenko.com/programming/fixing-malformed-utf8-in-lua. c.f., FixUTF8 @ https://github.com/pkulchenko/ZeroBraneStudio/blob/master/src/util.lua.

Parameters:

str string the string to be checked for invalid characters
replacement string the string to replace invalid characters with

Returns:

string

splitToArray (str, splitter, capture_empty_entity)

Splits input string with the splitter into a table. This function ignores the last empty entity.

Parameters:

str string the string to be split
splitter string
capture_empty_entity boolean

Returns:

unicodeCodepointToUtf8 (c)

Convert a Unicode codepoint (number) to UTF-8 char c.f., https://stackoverflow.com/a/4609989

 & <https://stackoverflow.com/a/38492214>

See utf8charcode in ffi/util for a decoder.

Parameters:

c integer Unicode codepoint

Returns:

string

htmlEntitiesToUtf8 (string)

Replace HTML entities with their UTF-8 encoded equivalent in text.

Supports only basic ones and those with numbers (no support for named entities like é).

Parameters:

string integer text with HTML entities

Returns:

string

htmlToPlainText (text)

Convert simple HTML to plain text.

This may fail on complex HTML (with styles, scripts, comments), but should be fine enough with simple HTML as found in EPUB's <dc:description>.

Parameters:

text string HTML text

Returns:

string

htmlToPlainTextIfHtml (text)

Convert HTML to plain text if text seems to be HTML Detection of HTML is simple and may raise false positives or negatives, but seems quite good at guessing content type of text found in EPUB's .

Parameters:

text string the string with possibly some HTML

Returns:

string

htmlEscape (text)

Encode the HTML entities in a string

Parameters:

text string the string to escape Taken from https://github.com/kernelsauce/turbo/blob/e4a35c2e3fb63f07464f8f8e17252bea3a029685/turbo/escape.lua#L58-L70

prettifyCSS (CSS)

Prettify a CSS stylesheet Not perfect, but enough to make some ugly CSS readable. By default, each selector and each property is put on its own line. With condensed=true, condense each full declaration on a single line.

Parameters:

CSS string string

Returns:

string

urlEncode (text)

Encode URL also known as percent-encoding see https://en.wikipedia.org/wiki/Percent-encoding

Parameters:

text string the string to encode

Returns:

encode

urlDecode (text)

Decode URL (reverse process to util.urlEncode())

Parameters:

text string the string to decode

Returns:

decode

checkLuaSyntax (text)

Check lua syntax of string

Parameters:

text string lua code text

Returns:

string

stringStartsWith (str, start)

Simple startsWith string helper.

C.f., http://lua-users.org/wiki/StringRecipes.

Parameters:

str string source string
start string string to match

Returns:

bool

stringEndsWith (str, ending)

Simple endsWith string helper.

Parameters:

str string source string
ending string string to match

Returns:

bool

stringSearch (or, str, start_pos)

Search a string in a text.

Parameters:

or string table txt Text (char list) to search in
str string String to search for
start_pos number Position number in text to start search from

Returns:

number Position number or 0 if not found
table Text char list
table Search string char list

wrapMethod (target_table, target_field_name, new_func, before_callback)

Wrap (or replace) a table method with a custom method, in a revertable way. This allows you extend the features of an existing module by modifying its internal methods, and then revert them back to normal later if necessary.

The most notable use-case for this is VirtualKeyboard's inputbox method wrapping to allow keyboards to add more complicated state-machines to modify how characters are input.

The returned table is the same table target_table[target_field_name] is set to. In addition to being callable, the new method has two sub-methods:

:revert() will un-wrap the method and revert it to the original state.

Note that if a method is wrapped multiple times, reverting it will revert it to the state of the method when util.wrapMethod was called (and if called on the table returned from util.wrapMethod, that is the state when that particular util.wrapMethod was called).
:raw_call(...) will call the original method with the given arguments and return whatever it returns.

This makes it more ergonomic to use the wrapped table methods in the case where you've replaced the regular function with your own implementation but you need to call the original functions inside your implementation.
:raw_method_call(...) will call the original method with the arguments (target_table, ...) and return whatever it returns. Note that the target_table used is the one associated with the util.wrapMethod call.

This makes it more ergonomic to use the wrapped table methods in the case where you've replaced the regular function with your own implementation but you need to call the original functions inside your implementation.

This is effectively short-hand for :raw_call(target_table, ...).

This is loosely based on busted/luassert's spies implementation (MIT). https://github.com/Olivine-Labs/luassert/blob/v1.7.11/src/spy.lua

Parameters:

target_table table The table whose method will be wrapped.
target_field_name string The name of the field to wrap.
new_func nil or func If non-nil, this function will be called instead of the original function after wrapping.
before_callback nil or func If non-nil, this function will be called (with the arguments (target_table, ...)) before the function is called.

Tables

t

Remove elements from an array, fast.

Swap & pop, like http://lua-users.org/lists/lua-l/2013-11/msg00027.html / https://stackoverflow.com/a/28942022, but preserving order. c.f., https://stackoverflow.com/a/53038524

Fields:

keep_cb function Filtering callback. Takes three arguments: table, index, new index. Returns true to keep the item. See link above for potential uses of the third argument.

Usage:

local foo = { "a", "b", "c", "b", "d", "e" }
local function drop_b(t, i, j)
    -- Discard any item with value "b"
    return t[i] ~= "b"
end
util.arrayRemove(foo, drop_b)

args

Escape list for shell usage

t

Clear all the elements from an array without reassignment.

Fields

UTF8_CHAR_PATTERN: Pattern which matches a single well-formed UTF-8 character, including theoretical >4-byte extensions. Taken from https://www.lua.org/manual/5.4/manual.html#pdf-utf8.charpattern

KOReader

Contents

Modules

Topics

Module util

Functions

Tables

Fields

Functions

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Parameters:

Parameters:

Parameters:

Parameters:

Parameters:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Returns:

Parameters:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Module `util`