Scripts Documentation¶
scripts.annotate Module¶
Functions¶
annotate_all(passages[, replace, as_array, …]) |
Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input) |
get_passages_with_progress_bar(filename_patterns) |
|
is_annotated(passage[, as_array, as_extra]) |
Whether the passage is already annotated or only partially annotated |
main(args) |
|
write_passage(passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.convert_1_0_to_1_2 Module¶
Functions¶
annotate_all(passages[, replace, as_array, …]) |
Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input) |
convert_passage(passage, report_writer) |
|
copy_edge(edge[, parent, child, tag, attrib]) |
|
destroy(node_or_edge) |
|
extract_aux(terminal, parent, grandparent) |
|
extract_ground(terminal, parent, grandparent) |
|
extract_modal(terminal, parent, grandparent) |
|
extract_relator(terminal, parent, grandparent) |
|
extract_that(terminal, parent, grandparent) |
|
fix_punct(terminal, parent, grandparent) |
|
fix_root_terminal_child(terminal, parent, …) |
|
fix_unary_participant(terminal, parent, …) |
|
flag_relator_starts_main_relation(terminal, …) |
|
flag_suspected_secondary(terminal, parent, …) |
|
fparent(node_or_edge) |
|
get_annotation(terminal, attr) |
|
get_passages_with_progress_bar(filename_patterns) |
|
is_main_relation(node) |
|
main(args) |
|
move_node(node, new_parent[, tag]) |
|
remove(parent, child) |
|
set_light_verb_function(terminal, parent, …) |
|
write_passage(passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.convert_2_0_to_1_2 Module¶
Functions¶
convert_passage(passage, report_writer) |
|
copy_edge(edge[, parent, child, tag, attrib]) |
|
destroy(node_or_edge) |
|
get_passages_with_progress_bar(filename_patterns) |
|
main(args) |
|
replace_time_and_quantifier(edge) |
|
write_passage(passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.count_parents_children Module¶
Functions¶
clip(l, m) |
|
get_passages_with_progress_bar(filename_patterns) |
|
main(args) |
|
plot_histogram(counter, label[, plot]) |
|
plot_pie(counter, label[, plot]) |
scripts.evaluate_db Module¶
The evaluation software for UCCA layer 1.
scripts.evaluate_standard Module¶
The evaluation script for UCCA layer 1.
Functions¶
check_args(args) |
|
main(args) |
|
match_by_id(guessed, ref) |
|
print_f1(result, eval_type) |
|
summarize(args, results, eval_type) |
scripts.find_constructions Module¶
scripts.fix_tokenization Module¶
Functions¶
context(i, terminals) |
|
create_token_element(state, text, is_punctuation) |
|
create_unit_element(state, text, tag) |
|
decode_special_chars(tokens) |
|
expand_to_neighboring_punct(i, is_puncts) |
>>> expand_to_neighboring_punct(0, [False, True, True])
|
false_indices(l) |
|
fix_tokenization(passage, words_set, lang, cw) |
|
from_site(elem) |
Converts site XML structure to core.Passage object. |
get_parents(paragraph, elements) |
|
get_passages_with_progress_bar(filename_patterns) |
|
get_tokenizer([tokenized, lang]) |
|
handle_words_set(rule, i, terminals, …) |
use set of words to determine the right fix needed |
insert_punct(insert_index, …) |
|
insert_retokenized(terminal, …) |
|
insert_retokenized_currency(i, terminals, …) |
|
insert_spaces(tokens) |
|
is_punct(text) |
|
main(args) |
|
normalize(passage[, extra]) |
|
read_dict(file) |
|
retokenize(i, start, end, terminals, …) |
|
split_apostrophe_to_units(i, terminals, …) |
split token with apostrophe to Elaborator and Center. |
split_apostrophe_unanalyzable(i, terminals, …) |
Split apostrophe as unanalyzable. |
split_hyphen_to_units(i, terminals, …) |
split token with hyphen to two different units. |
split_hyphen_unanalyzable(i, terminals, …) |
split token with hyphens to unanalyzable tokens. |
split_possessive_s_to_units(i, terminals, …) |
split possessive s to two different units. |
split_possessive_s_unanalyzable(i, …) |
split possessive s as unanalyzable. |
strip_context(new_context, old_context, …) |
>>> strip_context(["I", "'ve", "done"], ["I", "'ve", "done"], 1, 1)
|
to_site(passage) |
Converts a passage to the site XML format. |
write_passage(passage[, output_format, …]) |
Write a given UCCA passage in any format. |
Classes¶
Element |
|
SiteCfg |
Contains static configuration for conversion to/from the site XML. |
SiteUtil |
Contains utility functions for converting to/from the site XML. |
State() |
Class Inheritance Diagram¶

scripts.join_passages Module¶
Functions¶
get_passages(filename_patterns, **kwargs) |
|
main(args) |
|
passage2file(passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.load_word_vectors Module¶
Functions¶
get_word_vectors([dim, size, filename, vocab]) |
Get word vectors from spaCy model or from text file :param dim: dimension to trim vectors to (default: keep original) :param size: maximum number of vectors to load (default: all) :param filename: text file to load vectors from (default: from spaCy model) :param vocab: instead of strings, look up keys of returned dict in vocab (use lang str, e.g. |
main(args) |
scripts.normalize Module¶
scripts.pickle_to_standard Module¶
Functions¶
file2passage(filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
main(args) |
|
passage2file(passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.replace_tokens_by_dict Module¶
Functions¶
glob(pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main(args) |
|
read_dictionary_from_file(filename) |
scripts.site_pickle_to_standard Module¶
Functions¶
glob(pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main(args) |
|
pickle_site2passage(filename) |
Opens a pickle file containing XML in UCCA site format and returns its parsed Passage object |
write_passage(passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.site_to_standard Module¶
Functions¶
check_illegal_combinations(args) |
|
db2passage(handle, pid, user) |
Gets the annotation of user to pid from the DB handle - returns a passage |
fromstring(text[, parser]) |
Parse XML document from string constant. |
glob(pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main(args) |
|
site2passage(filename) |
Opens a file and returns its parsed Passage object |
write_passage(passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.site_to_text Module¶
Functions¶
db2passage(handle, pid, user) |
Gets the annotation of user to pid from the DB handle - returns a passage |
fromstring(text[, parser]) |
Parse XML document from string constant. |
main(args) |
|
site2passage(filename) |
Opens a file and returns its parsed Passage object |
scripts.split_corpus Module¶
Functions¶
copy(src, dest[, link]) |
|
copyfile(src, dst, *[, follow_symlinks]) |
Copy data from src to dst. |
main(args) |
|
not_split_dir(filename) |
|
numeric(s) |
|
split_passages(directory, train, dev, link) |
scripts.standard_to_pickle Module¶
Functions¶
external_write_mode(*args, **kwargs) |
|
file2passage(filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
main(args) |
|
passage2file(passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.standard_to_sentences Module¶
Functions¶
external_write_mode(*args, **kwargs) |
|
extract_terminals(p) |
returns an iterator of the terminals of the passage p |
get_passages_with_progress_bar(filename_patterns) |
|
main(args) |
|
normalize(passage[, extra]) |
|
passage2file(passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
split2sentences(passage[, remarks, lang, ids]) |
|
split_passage(passage, ends[, remarks, ids, …]) |
Split the passage on the given terminal positions :param passage: passage to split :param ends: sequence of positions at which the split passages will end :param remarks: add original node ID as remarks to the new nodes :param ids: optional iterable of ids, the same length as ends, to set passage IDs for each split :param suffix_format: in case ids is None, use this format for the running index suffix :param suffix_start: in case ids is None, use this starting index for the running index suffix :return: sequence of passages |
warning(msg, *args, **kwargs) |
Log a message with severity ‘WARNING’ on the root logger. |
Class Inheritance Diagram¶

scripts.standard_to_site Module¶
scripts.standard_to_text Module¶
Functions¶
file2passage(filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
get_passages_with_progress_bar(filename_patterns) |
|
glob(pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main(args) |
|
numeric(x) |
|
to_text(passage[, sentences, lang]) |
Converts from a Passage object to tokenized strings. |
write_text(passage, f, sentences, lang[, …]) |
scripts.unique_roles Module¶
scripts.validate Module¶
Functions¶
Pool |
Returns a process pool object |
check_args(parser, args) |
|
external_write_mode(*args, **kwargs) |
|
get_passages_with_progress_bar(filename_patterns) |
|
main(args) |
|
normalize(passage[, extra]) |
|
print_errors(passage_id, errors[, id_len]) |
|
validate(passage[, linkage, multigraph]) |
Class Inheritance Diagram¶

scripts.visualize Module¶
Functions¶
external_write_mode(*args, **kwargs) |
|
get_passages(filename_patterns, **kwargs) |
|
get_passages_with_progress_bar(filename_patterns) |
|
main(args) |
|
print_text(args, text, suffix) |
|
split2sentences(passage[, remarks, lang, ids]) |