Module util

This module contains miscellaneous helper functions for the KOReader frontend.

Functions

stripPunctuation (text) Strips all punctuation marks and spaces from a string.
gsplit (str, pattern, capture, capture_empty_entity) Splits a string by a pattern

Lua doesn't have a string.split() function and most of the time you don't really need it because string.gmatch() is enough.

secondsToClock (seconds, withoutSeconds) Converts seconds to a clock string.
secondsToHClock (seconds, withoutSeconds, hmsFormat) Converts seconds to a period of time string.
tableEquals (o1, o2, ignore_mt) Compares values in two different tables.
tableDeepCopy (o) Makes a deep copy of a table.
tableSize (t) Returns number of keys in a table.
arrayAppend (t1, t2) Append all elements from t2 into t1.
lastIndexOf (string, ch) Gets last index of character in string (i.e., strrchr)

Returns the index within this string of the last occurrence of the specified character or -1 if the character does not occur.

utf8Reverse (string) Reverse the individual greater-than-single-byte characters
splitToChars (text) Splits string into a list of UTF-8 characters.
isCJKChar (c) Tests whether c is a CJK character
hasCJKChar (str) Tests whether str contains CJK characters
splitToWords (text) Split texts into a list of words, spaces and punctuation marks.
isSplittable (c, next_c, prev_c) Test whether a string can be separated by this char for multi-line rendering.
getFilesystemType (path) Gets filesystem type of a path.
isEmptyDir (path) Checks if directory is empty.
fileExists (path) check if the given path is a file
pathExists (path) Checks if the given path exists.
makePath (path) As mkdir -p.
removeFile (path) As rm
getSafeFilename (str, path, limit) Replaces characters that are invalid in filenames.
splitFilePathName (file) Splits a file into its directory path and file name.
splitFileNameSuffix (file) Splits a file name into its pure file name and suffix
getFileNameSuffix (filename) Gets file extension
getScriptType (filename) Companion helper function that returns the script's language, based on the filme extension.
getFriendlySize (size, right_align) Gets human friendly size as string
getFormattedSize (size) Gets formatted size as string (1273334 => "1,273,334")
fixUtf8 (str, replacement) Replaces invalid UTF-8 characters with a replacement string.
splitToArray (str, splitter, capture_empty_entity) Splits input string with the splitter into a table.
unicodeCodepointToUtf8 (c) Convert a Unicode codepoint (number) to UTF-8 char c.f., https://stackoverflow.com/a/4609989

 & <https://stackoverflow.com/a/38492214>

See utf8charcode in ffi/util for a decoder.

htmlEntitiesToUtf8 (string) Replace HTML entities with their UTF-8 encoded equivalent in text.
htmlToPlainText (text) Convert simple HTML to plain text.
htmlToPlainTextIfHtml (text) Convert HTML to plain text if text seems to be HTML Detection of HTML is simple and may raise false positives or negatives, but seems quite good at guessing content type of text found in EPUB's .
htmlEscape (text) Encode the HTML entities in a string
prettifyCSS (CSS) Prettify a CSS stylesheet Not perfect, but enough to make some ugly CSS readable.
urlEncode (text) Encode URL also known as percent-encoding see https://en.wikipedia.org/wiki/Percent-encoding
urlDecode (text) Decode URL (reverse process to util.urlEncode())
checkLuaSyntax (text) Check lua syntax of string
unpackArchive (archive, extract_to) Unpack an archive.

Tables

args Escape list for shell usage
t Clear all the elements from a table without reassignment.
t Dumps a table into a file.


Functions

stripPunctuation (text)
Strips all punctuation marks and spaces from a string.

Parameters:

  • text string the string to be stripped

Returns:

    string stripped text
gsplit (str, pattern, capture, capture_empty_entity)
Splits a string by a pattern

Lua doesn't have a string.split() function and most of the time you don't really need it because string.gmatch() is enough. However string.gmatch() has one significant disadvantage for me: You can't split a string while matching both the delimited strings and the delimiters themselves without tracking positions and substrings. The gsplit function below takes care of this problem.

Author: Peter Odding

License: MIT/X11

Source: http://snippets.luacode.org/snippets/Stringsplitting130

Parameters:

  • str string string to split
  • pattern the pattern to split against
  • capture bool
  • capture_empty_entity bool
secondsToClock (seconds, withoutSeconds)
Converts seconds to a clock string.

Source: https://gist.github.com/jesseadams/791673

Parameters:

  • seconds int number of seconds
  • withoutSeconds bool if true 00:00, if false 00:00:00

Returns:

    string clock string in the form of 00:00 or 00:00:00
secondsToHClock (seconds, withoutSeconds, hmsFormat)
Converts seconds to a period of time string.

Parameters:

  • seconds int number of seconds
  • withoutSeconds bool if true 1h30', if false 1h30'10''
  • hmsFormat bool , if true format 1h30m10s

Returns:

    string clock string in the form of 1h30' or 1h30'10''
tableEquals (o1, o2, ignore_mt)
Compares values in two different tables.

Source: https://stackoverflow.com/a/32660766/2470572

Parameters:

  • o1 Lua table
  • o2 Lua table
  • ignore_mt bool

Returns:

    boolean
tableDeepCopy (o)
Makes a deep copy of a table.

Source: https://stackoverflow.com/a/16077650/2470572

Parameters:

  • o Lua table

Returns:

    Lua table
tableSize (t)
Returns number of keys in a table.

Parameters:

  • t Lua table

Returns:

    int number of keys in table t
arrayAppend (t1, t2)
Append all elements from t2 into t1.

Parameters:

  • t1 Lua table
  • t2 Lua table
lastIndexOf (string, ch)
Gets last index of character in string (i.e., strrchr)

Returns the index within this string of the last occurrence of the specified character or -1 if the character does not occur.

To find . you need to escape it.

Parameters:

Returns:

    int last occurrence or -1 if not found
utf8Reverse (string)
Reverse the individual greater-than-single-byte characters

Parameters:

splitToChars (text)
Splits string into a list of UTF-8 characters.

Parameters:

  • text string the string to be split.

Returns:

    table list of UTF-8 chars
isCJKChar (c)
Tests whether c is a CJK character

Parameters:

Returns:

    boolean true if CJK
hasCJKChar (str)
Tests whether str contains CJK characters

Parameters:

Returns:

    boolean true if CJK
splitToWords (text)
Split texts into a list of words, spaces and punctuation marks.

Parameters:

Returns:

    table list of words, spaces and punctuation marks
isSplittable (c, next_c, prev_c)
Test whether a string can be separated by this char for multi-line rendering. Optional next or prev chars may be provided to help make the decision

Parameters:

Returns:

    boolean true if splittable, false if not
getFilesystemType (path)
Gets filesystem type of a path.

Checks if the path occurs in /proc/mounts

Parameters:

  • path string an absolute path

Returns:

    string filesystem type
isEmptyDir (path)
Checks if directory is empty.

Parameters:

Returns:

    bool
fileExists (path)
check if the given path is a file

Parameters:

Returns:

    bool
pathExists (path)
Checks if the given path exists. Doesn't care if it's a file or directory.

Parameters:

Returns:

    bool
makePath (path)
As mkdir -p. Unlike lfs.mkdir(), does not error if the directory already exists, and creates intermediate directories as needed.

Parameters:

  • path string the directory to create

Returns:

    bool true on success; nil, err_message on error
removeFile (path)
As rm

Parameters:

  • path string of the file to remove

Returns:

    bool true on success; nil, err_message on error
getSafeFilename (str, path, limit)
Replaces characters that are invalid in filenames.

Replaces the characters \/:*?"<>| with an _ unless an optional path is provided. These characters are problematic on Windows filesystems. On Linux only the / poses a problem.

If an optional path is provided, util.getFilesystemType() will be used to determine whether stricter VFAT restrictions should be applied.

Parameters:

Returns:

    string safe filename
splitFilePathName (file)
Splits a file into its directory path and file name. If the given path has a trailing /, returns the entire path as the directory path and "" as the file name.

Parameters:

Returns:

    string path, filename
splitFileNameSuffix (file)
Splits a file name into its pure file name and suffix

Parameters:

Returns:

    string path, extension
getFileNameSuffix (filename)
Gets file extension

Parameters:

Returns:

    string extension
getScriptType (filename)
Companion helper function that returns the script's language, based on the filme extension.

Parameters:

Returns:

    string (lowercase) (or nil if !isAllowedScript)
getFriendlySize (size, right_align)
Gets human friendly size as string

Parameters:

  • size int (bytes)
  • right_align bool (by padding with spaces on the left)

Returns:

    string
getFormattedSize (size)
Gets formatted size as string (1273334 => "1,273,334")

Parameters:

  • size int (bytes)

Returns:

    string
fixUtf8 (str, replacement)
Replaces invalid UTF-8 characters with a replacement string.

Based on http://notebook.kulchenko.com/programming/fixing-malformed-utf8-in-lua. c.f., FixUTF8 @ https://github.com/pkulchenko/ZeroBraneStudio/blob/master/src/util.lua.

Parameters:

  • str string the string to be checked for invalid characters
  • replacement string the string to replace invalid characters with

Returns:

    string valid UTF-8
splitToArray (str, splitter, capture_empty_entity)
Splits input string with the splitter into a table. This function ignores the last empty entity.

Parameters:

  • str string the string to be split
  • splitter string
  • capture_empty_entity bool

Returns:

    an array-like table
unicodeCodepointToUtf8 (c)
Convert a Unicode codepoint (number) to UTF-8 char c.f., https://stackoverflow.com/a/4609989

 & <https://stackoverflow.com/a/38492214>

See utf8charcode in ffi/util for a decoder.

Parameters:

  • c int Unicode codepoint

Returns:

    string UTF-8 char
htmlEntitiesToUtf8 (string)
Replace HTML entities with their UTF-8 encoded equivalent in text.

Supports only basic ones and those with numbers (no support for named entities like &eacute;).

Parameters:

  • string int text with HTML entities

Returns:

    string UTF-8 text
htmlToPlainText (text)
Convert simple HTML to plain text.

This may fail on complex HTML (with styles, scripts, comments), but should be fine enough with simple HTML as found in EPUB's <dc:description>.

Parameters:

Returns:

    string plain text
htmlToPlainTextIfHtml (text)
Convert HTML to plain text if text seems to be HTML Detection of HTML is simple and may raise false positives or negatives, but seems quite good at guessing content type of text found in EPUB's .

Parameters:

  • text string the string with possibly some HTML

Returns:

    string cleaned text
htmlEscape (text)
Encode the HTML entities in a string

Parameters:

  • text string the string to escape Taken from https://github.com/kernelsauce/turbo/blob/e4a35c2e3fb63f07464f8f8e17252bea3a029685/turbo/escape.lua#L58-L70
prettifyCSS (CSS)
Prettify a CSS stylesheet Not perfect, but enough to make some ugly CSS readable. By default, each selector and each property is put on its own line. With condensed=true, condense each full declaration on a single line.

Parameters:

Returns:

    string the CSS prettified
urlEncode (text)
Encode URL also known as percent-encoding see https://en.wikipedia.org/wiki/Percent-encoding

Parameters:

  • text string the string to encode

Returns:

    encode string Taken from https://gist.github.com/liukun/f9ce7d6d14fa45fe9b924a3eed5c3d99
urlDecode (text)
Decode URL (reverse process to util.urlEncode())

Parameters:

  • text string the string to decode

Returns:

    decode string Taken from https://gist.github.com/liukun/f9ce7d6d14fa45fe9b924a3eed5c3d99
checkLuaSyntax (text)
Check lua syntax of string

Parameters:

Returns:

    string with parsing error, nil if syntax ok
unpackArchive (archive, extract_to)
Unpack an archive. Extract the contents of an archive, detecting its format by filename extension. Inspired by luarocks archive_unpack()

Parameters:

  • archive string: Filename of archive.
  • extract_to string: Destination directory.

Returns:

    boolean or (boolean, string): true on success, false and an error message on failure.

Tables

args
Escape list for shell usage
t
Clear all the elements from a table without reassignment.
t
Dumps a table into a file.

Fields:

  • file string the file to store the table
generated by LDoc 1.4.6 Last updated 2020-07-05 22:54:19