elemeta.nlp.extractors package#
Subpackages#
- elemeta.nlp.extractors.high_level package
- Submodules
- elemeta.nlp.extractors.high_level.acronym_count module
- elemeta.nlp.extractors.high_level.avg_word_length module
- elemeta.nlp.extractors.high_level.capital_letters_ratio module
- elemeta.nlp.extractors.high_level.date_count module
- elemeta.nlp.extractors.high_level.detect_language_langdetect module
- elemeta.nlp.extractors.high_level.email_count module
- elemeta.nlp.extractors.high_level.embedding module
- elemeta.nlp.extractors.high_level.emoji_count module
- elemeta.nlp.extractors.high_level.hashtag_count module
- elemeta.nlp.extractors.high_level.hinted_profanity_sentence_count module
- elemeta.nlp.extractors.high_level.hinted_profanity_words_count module
- elemeta.nlp.extractors.high_level.link_count module
- elemeta.nlp.extractors.high_level.mention_count module
- elemeta.nlp.extractors.high_level.must_appear_words_percentage module
- elemeta.nlp.extractors.high_level.ner_identifier module
- elemeta.nlp.extractors.high_level.number_count module
- elemeta.nlp.extractors.high_level.out_of_vocabulary_count module
- elemeta.nlp.extractors.high_level.pii_identify module
- elemeta.nlp.extractors.high_level.punctuation_count module
- elemeta.nlp.extractors.high_level.regex_match_count module
- elemeta.nlp.extractors.high_level.semantic_text_pair_similarity module
- elemeta.nlp.extractors.high_level.sentence_avg_length module
- elemeta.nlp.extractors.high_level.sentence_count module
- elemeta.nlp.extractors.high_level.sentiment_polarity module
- elemeta.nlp.extractors.high_level.sentiment_subjectivity module
- elemeta.nlp.extractors.high_level.special_chars_count module
- elemeta.nlp.extractors.high_level.stop_words_count module
- elemeta.nlp.extractors.high_level.syllable_count module
- elemeta.nlp.extractors.high_level.text_complexity module
- elemeta.nlp.extractors.high_level.text_length module
- elemeta.nlp.extractors.high_level.toxicity_extractor module
- elemeta.nlp.extractors.high_level.unique_word_count module
- elemeta.nlp.extractors.high_level.unique_word_ratio module
- elemeta.nlp.extractors.high_level.word_count module
- elemeta.nlp.extractors.high_level.word_regex_matches_count module
- Module contents
- elemeta.nlp.extractors.low_level package
- Submodules
- elemeta.nlp.extractors.low_level.abstract_text_metafeature_extractor module
- elemeta.nlp.extractors.low_level.abstract_text_pair_metafeature_extractor module
- elemeta.nlp.extractors.low_level.avg_token_length module
- elemeta.nlp.extractors.low_level.hinted_profanity_token_count module
- elemeta.nlp.extractors.low_level.must_appear_tokens_parentage module
- elemeta.nlp.extractors.low_level.regex_token_matches_count module
- elemeta.nlp.extractors.low_level.semantic_embedding_pair_similarity module
- elemeta.nlp.extractors.low_level.semantic_text_to_group_similarity module
- elemeta.nlp.extractors.low_level.tokens_count module
- elemeta.nlp.extractors.low_level.unique_token_count module
- elemeta.nlp.extractors.low_level.unique_token_ratio module
- Module contents
Module contents#
- elemeta.nlp.extractors.avg_check_basic(tokenizer: Callable[[str], List[str]], condition: Callable[[str], bool]) Callable[[str], float] #
generic avg counter generator
- Parameters:
tokenizer (Callable[[str],List[str]]) – a function that splits a text into components. Usually into words
condition (Callable[[str],bool]) – a function that returns true if the token be counted
- Returns:
a function that receives text as string, and outputs the avg length of tokens that are valid according to condition.
- Return type:
Callable[[str],float]
- elemeta.nlp.extractors.length_check_basic(tokenizer: Callable[[str], List[str]], condition: Callable[[str], bool]) Callable[[str], int] #
generic count function generator
- Parameters:
tokenizer (Callable[[str],List[str]]) – a function that splits a text into components. Usually into words
condition (Callable[[str],bool]) – a function that returns true if the token be counted
- Returns:
a function that receives text as string, and outputs the number of tokens that are valid according to condition.
- Return type:
Callable[[str],float]