Scripts Documentation¶

scripts.annotate Module¶

Functions¶

`annotate_all`(passages[, replace, as_array, …])	Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input)
`get_passages_with_progress_bar`(filename_patterns)
`is_annotated`(passage[, as_array, as_extra])	Whether the passage is already annotated or only partially annotated
`main`(args)
`write_passage`(passage[, output_format, …])	Write a given UCCA passage in any format.

scripts.convert_1_0_to_1_2 Module¶

Functions¶

`annotate_all`(passages[, replace, as_array, …])	Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input)
`convert_passage`(passage, report_writer)
`copy_edge`(edge[, parent, child, tag, attrib])
`destroy`(node_or_edge)
`extract_aux`(terminal, parent, grandparent)
`extract_ground`(terminal, parent, grandparent)
`extract_modal`(terminal, parent, grandparent)
`extract_relator`(terminal, parent, grandparent)
`extract_that`(terminal, parent, grandparent)
`fix_punct`(terminal, parent, grandparent)
`fix_root_terminal_child`(terminal, parent, …)
`fix_unary_participant`(terminal, parent, …)
`flag_relator_starts_main_relation`(terminal, …)
`flag_suspected_secondary`(terminal, parent, …)
`fparent`(node_or_edge)
`get_annotation`(terminal, attr)
`get_passages_with_progress_bar`(filename_patterns)
`is_main_relation`(node)
`main`(args)
`move_node`(node, new_parent[, tag])
`remove`(parent, child)
`set_light_verb_function`(terminal, parent, …)
`write_passage`(passage[, output_format, …])	Write a given UCCA passage in any format.

scripts.convert_2_0_to_1_2 Module¶

Functions¶

`convert_passage`(passage, report_writer)
`copy_edge`(edge[, parent, child, tag, attrib])
`destroy`(node_or_edge)
`get_passages_with_progress_bar`(filename_patterns)
`main`(args)
`replace_time_and_quantifier`(edge)
`write_passage`(passage[, output_format, …])	Write a given UCCA passage in any format.

scripts.count_parents_children Module¶

Functions¶

`clip`(l, m)
`get_passages_with_progress_bar`(filename_patterns)
`main`(args)
`plot_histogram`(counter, label[, plot])
`plot_pie`(counter, label[, plot])

scripts.evaluate_db Module¶

The evaluation software for UCCA layer 1.

Functions¶

`evaluate`(guessed, ref[, converter, verbose, …])	Compare two passages and return requested diagnostics and scores, possibly printing them too.
`main`(args)

scripts.evaluate_standard Module¶

The evaluation script for UCCA layer 1.

Functions¶

`check_args`(args)
`main`(args)
`match_by_id`(guessed, ref)
`print_f1`(result, eval_type)
`summarize`(args, results, eval_type)

scripts.find_constructions Module¶

Functions¶

`add_argument`(argparser[, default])
`external_write_mode`(args, *kwargs)
`extract_candidates`(passage[, constructions, …])	Find candidate edges by constructions in UCCA passage.
`get_passages_with_progress_bar`(filename_patterns)
`main`(args)

scripts.fix_tokenization Module¶

Functions¶

`context`(i, terminals)
`create_token_element`(state, text, is_punctuation)
`create_unit_element`(state, text, tag)
`decode_special_chars`(tokens)
`expand_to_neighboring_punct`(i, is_puncts)	>>> expand_to_neighboring_punct(0, [False, True, True])
`false_indices`(l)
`fix_tokenization`(passage, words_set, lang, cw)
`from_site`(elem)	Converts site XML structure to `core`.Passage object.
`get_parents`(paragraph, elements)
`get_passages_with_progress_bar`(filename_patterns)
`get_tokenizer`([tokenized, lang])
`handle_words_set`(rule, i, terminals, …)	use set of words to determine the right fix needed
`insert_punct`(insert_index, …)
`insert_retokenized`(terminal, …)
`insert_retokenized_currency`(i, terminals, …)
`insert_spaces`(tokens)
`is_punct`(text)
`main`(args)
`normalize`(passage[, extra])
`read_dict`(file)
`retokenize`(i, start, end, terminals, …)
`split_apostrophe_to_units`(i, terminals, …)	split token with apostrophe to Elaborator and Center.
`split_apostrophe_unanalyzable`(i, terminals, …)	Split apostrophe as unanalyzable.
`split_hyphen_to_units`(i, terminals, …)	split token with hyphen to two different units.
`split_hyphen_unanalyzable`(i, terminals, …)	split token with hyphens to unanalyzable tokens.
`split_possessive_s_to_units`(i, terminals, …)	split possessive s to two different units.
`split_possessive_s_unanalyzable`(i, …)	split possessive s as unanalyzable.
`strip_context`(new_context, old_context, …)	>>> strip_context(["I", "'ve", "done"], ["I", "'ve", "done"], 1, 1)
`to_site`(passage)	Converts a passage to the site XML format.
`write_passage`(passage[, output_format, …])	Write a given UCCA passage in any format.

Classes¶

`Element`
`SiteCfg`	Contains static configuration for conversion to/from the site XML.
`SiteUtil`	Contains utility functions for converting to/from the site XML.
`State`()

Class Inheritance Diagram¶

Inheritance diagram of scripts.fix_tokenization.State

scripts.join_passages Module¶

Functions¶

`get_passages`(filename_patterns, **kwargs)
`main`(args)
`passage2file`(passage, filename[, indent, binary])	Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)

scripts.join_sdp Module¶

Functions¶

main(args)

scripts.load_word_vectors Module¶

Functions¶

`get_word_vectors`([dim, size, filename, vocab])	Get word vectors from spaCy model or from text file :param dim: dimension to trim vectors to (default: keep original) :param size: maximum number of vectors to load (default: all) :param filename: text file to load vectors from (default: from spaCy model) :param vocab: instead of strings, look up keys of returned dict in vocab (use lang str, e.g.
`main`(args)

scripts.normalize Module¶

Functions¶

`get_passages_with_progress_bar`(filename_patterns)
`main`(args)
`normalize`(passage[, extra])
`write_passage`(passage[, output_format, …])	Write a given UCCA passage in any format.

scripts.pickle_to_standard Module¶

Functions¶

`file2passage`(filename)	Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to
`main`(args)
`passage2file`(passage, filename[, indent, binary])	Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)

scripts.replace_tokens_by_dict Module¶

Functions¶

`glob`(pathname, *[, recursive])	Return a list of paths matching a pathname pattern.
`main`(args)
`read_dictionary_from_file`(filename)

scripts.site_pickle_to_standard Module¶

Functions¶

`glob`(pathname, *[, recursive])	Return a list of paths matching a pathname pattern.
`main`(args)
`pickle_site2passage`(filename)	Opens a pickle file containing XML in UCCA site format and returns its parsed Passage object
`write_passage`(passage[, output_format, …])	Write a given UCCA passage in any format.

scripts.site_to_standard Module¶

Functions¶

`check_illegal_combinations`(args)
`db2passage`(handle, pid, user)	Gets the annotation of user to pid from the DB handle - returns a passage
`fromstring`(text[, parser])	Parse XML document from string constant.
`glob`(pathname, *[, recursive])	Return a list of paths matching a pathname pattern.
`main`(args)
`site2passage`(filename)	Opens a file and returns its parsed Passage object
`write_passage`(passage[, output_format, …])	Write a given UCCA passage in any format.

scripts.site_to_text Module¶

Functions¶

`db2passage`(handle, pid, user)	Gets the annotation of user to pid from the DB handle - returns a passage
`fromstring`(text[, parser])	Parse XML document from string constant.
`main`(args)
`site2passage`(filename)	Opens a file and returns its parsed Passage object

scripts.split_corpus Module¶

Functions¶

`copy`(src, dest[, link])
`copyfile`(src, dst, *[, follow_symlinks])	Copy data from src to dst.
`main`(args)
`not_split_dir`(filename)
`numeric`(s)
`split_passages`(directory, train, dev, link)

scripts.standard_to_pickle Module¶

Functions¶

`external_write_mode`(args, *kwargs)
`file2passage`(filename)	Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to
`main`(args)
`passage2file`(passage, filename[, indent, binary])	Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)

scripts.standard_to_sentences Module¶

Functions¶

`external_write_mode`(args, *kwargs)
`extract_terminals`(p)	returns an iterator of the terminals of the passage p
`get_passages_with_progress_bar`(filename_patterns)
`main`(args)
`normalize`(passage[, extra])
`passage2file`(passage, filename[, indent, binary])	Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)
`split2sentences`(passage[, remarks, lang, ids])
`split_passage`(passage, ends[, remarks, ids, …])	Split the passage on the given terminal positions :param passage: passage to split :param ends: sequence of positions at which the split passages will end :param remarks: add original node ID as remarks to the new nodes :param ids: optional iterable of ids, the same length as ends, to set passage IDs for each split :param suffix_format: in case ids is None, use this format for the running index suffix :param suffix_start: in case ids is None, use this starting index for the running index suffix :return: sequence of passages
`warning`(msg, args, *kwargs)	Log a message with severity ‘WARNING’ on the root logger.

Classes¶

`Splitter`(sentences[, enum, suffix_format, …])
`count`	count(start=0, step=1) –> count object

Class Inheritance Diagram¶

Inheritance diagram of scripts.standard_to_sentences.Splitter

scripts.standard_to_site Module¶

Functions¶

`external_write_mode`(args, *kwargs)
`get_passages_with_progress_bar`(filename_patterns)
`main`(args)
`tostring`(element[, encoding, method, …])	Generate string representation of XML element.

scripts.standard_to_text Module¶

Functions¶

`file2passage`(filename)	Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to
`get_passages_with_progress_bar`(filename_patterns)
`glob`(pathname, *[, recursive])	Return a list of paths matching a pathname pattern.
`main`(args)
`numeric`(x)
`to_text`(passage[, sentences, lang])	Converts from a Passage object to tokenized strings.
`write_text`(passage, f, sentences, lang[, …])

scripts.statistics Module¶

Functions¶

`get_passages_with_progress_bar`(filename_patterns)
`main`(args)

scripts.unique_roles Module¶

Functions¶

`get_passages_with_progress_bar`(filename_patterns)
`main`(args)

scripts.validate Module¶

Functions¶

`Pool`	Returns a process pool object
`check_args`(parser, args)
`external_write_mode`(args, *kwargs)
`get_passages_with_progress_bar`(filename_patterns)
`main`(args)
`normalize`(passage[, extra])
`print_errors`(passage_id, errors[, id_len])
`validate`(passage[, linkage, multigraph])

Classes¶

Validator([normalization, extra, linkage, …])

Class Inheritance Diagram¶

Inheritance diagram of scripts.validate.Validator

scripts.visualize Module¶

Functions¶

`external_write_mode`(args, *kwargs)
`get_passages`(filename_patterns, **kwargs)
`get_passages_with_progress_bar`(filename_patterns)
`main`(args)
`print_text`(args, text, suffix)
`split2sentences`(passage[, remarks, lang, ids])