elemeta.nlp.extractors package#

Subpackages#

Module contents#

elemeta.nlp.extractors.avg_check_basic(tokenizer: Callable[[str], List[str]], condition: Callable[[str], bool]) Callable[[str], float]#

generic avg counter generator

Parameters:
  • tokenizer (Callable[[str],List[str]]) – a function that splits a text into components. Usually into words

  • condition (Callable[[str],bool]) – a function that returns true if the token be counted

Returns:

a function that receives text as string, and outputs the avg length of tokens that are valid according to condition.

Return type:

Callable[[str],float]

elemeta.nlp.extractors.length_check_basic(tokenizer: Callable[[str], List[str]], condition: Callable[[str], bool]) Callable[[str], int]#

generic count function generator

Parameters:
  • tokenizer (Callable[[str],List[str]]) – a function that splits a text into components. Usually into words

  • condition (Callable[[str],bool]) – a function that returns true if the token be counted

Returns:

a function that receives text as string, and outputs the number of tokens that are valid according to condition.

Return type:

Callable[[str],float]