Module util

This module contains miscellaneous helper functions for the KOReader frontend.

Functions

stripePunctuations (text) Strips all punctuation and spaces from a string.
gsplit (str, pattern, capture, capture_empty_entity) Splits a string by a pattern

Lua doesn't have a string.split() function and most of the time you don't really need it because string.gmatch() is enough.

secondsToClock (seconds, withoutSeconds) Converts seconds to a clock string.
tableSize (T) Returns number of keys in a table.
arrayAppend (t1, t2) Append all elements from t2 into t1.
lastIndexOf (string, ch) Gets last index of string in character

Returns the index within this string of the last occurrence of the specified character or -1 if the character does not occur.

splitToChars (text) Splits string into a list of UTF-8 characters.
isCJKChar (c) Tests whether c is a CJK character
hasCJKChar (str) Tests whether str contains CJK characters
splitToWords (text) Split texts into a list of words, spaces and punctuation.
isSplittable (c, next_c, prev_c) Test whether a string can be separated by this char for multi-line rendering.
getFilesystemType (path) Gets filesystem type of a path.
isEmptyDir (path) Checks if directory is empty.
replaceInvalidChars (str) Replaces characters that are invalid filenames.
replaceSlashChar (str) Replaces slash with an underscore.
splitFilePathName (file) Splits a file into its path and name
splitFileNameSuffix (file) Splits a file name into its pure file name and suffix
getFileNameSuffix (filename) Gets file extension
getMenuText (item) Adds > to touch menu items with a submenu
fixUtf8 (str, replacement) Replaces invalid UTF-8 characters with a replacement string.
splitToArray (str, splitter, capture_empty_entity) Splits input string with the splitter into a table.
unicodeCodepointToUtf8 (c) Convert a Unicode codepoint (number) to UTF8 char
htmlEntitiesToUtf8 (string) Replace HTML entities with their UTF8 equivalent in text
htmlToPlainText (text) Convert simple HTML to plain text This may fail on complex HTML (with styles, scripts, comments), but should be fine enough with simple HTML as found in EPUB's .
htmlToPlainTextIfHtml (text) Convert HTML to plain text if text seems to be HTML Detection of HTML is simple and may raise false positives or negatives, but seems quite good at guessing content type of text found in EPUB's .


Functions

stripePunctuations (text)
Strips all punctuation and spaces from a string.

Parameters:

  • text string the string to be stripped

Returns:

    string stripped text
gsplit (str, pattern, capture, capture_empty_entity)
Splits a string by a pattern

Lua doesn't have a string.split() function and most of the time you don't really need it because string.gmatch() is enough. However string.gmatch() has one significant disadvantage for me: You can't split a string while matching both the delimited strings and the delimiters themselves without tracking positions and substrings. The gsplit function below takes care of this problem.

Author: Peter Odding

License: MIT/X11

Source: http://snippets.luacode.org/snippets/Stringsplitting130

Parameters:

  • str string string to split
  • pattern the pattern to split against
  • capture bool
  • capture_empty_entity bool
secondsToClock (seconds, withoutSeconds)
Converts seconds to a clock string.

Source: https://gist.github.com/jesseadams/791673

Parameters:

  • seconds int number of seconds
  • withoutSeconds bool if true 00:00, if false 00:00:00

Returns:

    string clock string in the form of 00:00 or 00:00:00
tableSize (T)
Returns number of keys in a table.

Parameters:

  • T Lua table

Returns:

    int number of keys in table T
arrayAppend (t1, t2)
Append all elements from t2 into t1.

Parameters:

  • t1 Lua table
  • t2 Lua table
lastIndexOf (string, ch)
Gets last index of string in character

Returns the index within this string of the last occurrence of the specified character or -1 if the character does not occur.

To find . you need to escape it.

Parameters:

Returns:

    int last occurrence or -1 if not found
splitToChars (text)
Splits string into a list of UTF-8 characters.

Parameters:

  • text string the string to be split.

Returns:

    table list of UTF-8 chars
isCJKChar (c)
Tests whether c is a CJK character

Parameters:

Returns:

    boolean true if CJK
hasCJKChar (str)
Tests whether str contains CJK characters

Parameters:

Returns:

    boolean true if CJK
splitToWords (text)
Split texts into a list of words, spaces and punctuation.

Parameters:

Returns:

    table list of words, spaces and punctuation
isSplittable (c, next_c, prev_c)
Test whether a string can be separated by this char for multi-line rendering. Optional next or prev chars may be provided to help make the decision

Parameters:

Returns:

    boolean true if splittable, false if not
getFilesystemType (path)
Gets filesystem type of a path.

Checks if the path occurs in /proc/mounts

Parameters:

  • path string an absolute path

Returns:

    string filesystem type
isEmptyDir (path)
Checks if directory is empty.

Parameters:

Returns:

    bool
replaceInvalidChars (str)
Replaces characters that are invalid filenames.

Replaces the characters \/:*?"<>| with an _. These characters are problematic on Windows filesystems. On Linux only / poses a problem.

Parameters:

Returns:

    string sanitized filename
replaceSlashChar (str)
Replaces slash with an underscore.

Parameters:

Returns:

    string
splitFilePathName (file)
Splits a file into its path and name

Parameters:

Returns:

    string path, filename
splitFileNameSuffix (file)
Splits a file name into its pure file name and suffix

Parameters:

Returns:

    string path, extension
getFileNameSuffix (filename)
Gets file extension

Parameters:

Returns:

    string extension
getMenuText (item)
Adds > to touch menu items with a submenu

Parameters:

  • item
fixUtf8 (str, replacement)
Replaces invalid UTF-8 characters with a replacement string.

Based on http://notebook.kulchenko.com/programming/fixing-malformed-utf8-in-lua

Parameters:

  • str string the string to be checked for invalid characters
  • replacement string the string to replace invalid characters with

Returns:

    string valid UTF-8
splitToArray (str, splitter, capture_empty_entity)
Splits input string with the splitter into a table. This function ignores the last empty entity.

Parameters:

  • str string the string to be split
  • splitter string
  • capture_empty_entity bool

Returns:

    an array-like table
unicodeCodepointToUtf8 (c)
Convert a Unicode codepoint (number) to UTF8 char

Parameters:

  • c int Unicode codepoint

Returns:

    string UTF8 char
htmlEntitiesToUtf8 (string)
Replace HTML entities with their UTF8 equivalent in text Supports only basic ones and those with numbers (no support for named entities like é)

Parameters:

  • string int text with HTML entities

Returns:

    string UTF8 text
htmlToPlainText (text)
Convert simple HTML to plain text This may fail on complex HTML (with styles, scripts, comments), but should be fine enough with simple HTML as found in EPUB's .

Parameters:

Returns:

    string plain text
htmlToPlainTextIfHtml (text)
Convert HTML to plain text if text seems to be HTML Detection of HTML is simple and may raise false positives or negatives, but seems quite good at guessing content type of text found in EPUB's .

Parameters:

  • text string the string with possibly some HTML

Returns:

    string cleaned text
generated by LDoc 1.4.6 Last updated 2017-08-17 09:26:49