For more information about how to use this library, see the API Documentation.
API Documentation¶
Getting Started¶
To load UCCA passages from XML files, manipulate them and write to files, use the following code template:
from ucca.ioutil import get_passages_with_progress_bar, write_passage
for passage in get_passages_with_progress_bar(filenames):
...
write_passage(passage)
Each passage instantiates the ucca.core.Passage
class.
XML files can be downloaded from the various UCCA corpora.
ucca.constructions Module¶
Functions¶
add_argument (argparser[, default]) |
|||
create_category_construction (tag) |
|||
create_passage_yields (p, *args[, tags]) |
|
||
diff_terminals (*passages) |
|||
extract_candidates (passage[, constructions, …]) |
Find candidate edges by constructions in UCCA passage. | ||
get_by_name (name) |
|||
get_by_names ([names]) |
|||
positions (terminals) |
|||
terminal_ids (passage) |
|||
verify_terminals_match (passage, reference) |
Classes¶
Candidate (edge[, reference, …]) |
|
Categories () |
|
Construction (name, description, criterion[, …]) |
|
EdgeTags |
Layer 1 Edge tags. |
NodeTags |
Layer 1 Node tags. |
OrderedDict |
Dictionary that remembers insertion order |
chain |
chain(*iterables) –> chain object |
Class Inheritance Diagram¶

ucca.convert Module¶
Converter module between different UCCA annotation formats.
This module contains utilities to convert between UCCA annotation in different
forms, to/from the core
.Passage form, acts as a pivot for all
conversions.
- The possible other formats are:
- site XML standard XML conll (CoNLL-X dependency parsing shared task) sdp (SemEval 2015 semantic dependency parsing shared task)
Functions¶
attach_punct (l0, l1) |
|
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
from_json (lines, *args[, …]) |
Convert text (or dict) in UCCA-App JSON format to a Passage object. |
from_site (elem) |
Converts site XML structure to core .Passage object. |
from_standard (root[, extra_funcs]) |
|
from_text (text[, passage_id, tokenized, …]) |
Converts from tokenized strings to a Passage object. |
get_categories_details (d) |
|
get_json_attrib (d) |
|
join_passages (passages[, passage_id, remarks]) |
Join passages to one passage with all the nodes in order :param passages: sequence of passages to join :param passage_id: ID of newly created passage (otherwise, ID of first passage) :param remarks: add original node ID as remarks to the new nodes :return: joined passage |
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
pickle2passage (filename) |
|
split2paragraphs (passage[, remarks, lang, ids]) |
|
split2segments (passage, is_sentences[, …]) |
Split passage to sub-passages :param passage: Passage object :param is_sentences: if True, split to sentences; otherwise, paragraphs :param remarks: Whether to add remarks with original node IDs :param lang: language to use for sentence splitting model :param ids: optional iterable of ids to set passage IDs for each split :return: sequence of passages |
split2sentences (passage[, remarks, lang, ids]) |
|
split_passage (passage, ends[, remarks, ids, …]) |
Split the passage on the given terminal positions :param passage: passage to split :param ends: sequence of positions at which the split passages will end :param remarks: add original node ID as remarks to the new nodes :param ids: optional iterable of ids, the same length as ends, to set passage IDs for each split :param suffix_format: in case ids is None, use this format for the running index suffix :param suffix_start: in case ids is None, use this starting index for the running index suffix :return: sequence of passages |
to_json (passage, *args[, return_dict, …]) |
Convert a Passage object to text (or dict) in UCCA-App JSON :param passage: the Passage object to convert :param return_dict: whether to return dict rather than list of lines :param tok_task: either None (to do tokenization too), or a completed tokenization task dict with token IDs, or True, to indicate that the function should do only tokenization and not annotation :param all_categories: list of category dicts so that IDs can be added, if available - otherwise names are used :param skip_category_mapping: if False, translate edge tag abbreviations to category names; if True, don’t :return: list of lines in JSON format if return_dict=False, or task dict if True |
to_sequence (passage) |
Converts from a Passage object to linearized text sequence. |
to_site (passage) |
Converts a passage to the site XML format. |
to_standard (passage) |
Converts a Passage object to a standard XML root element. |
to_text (passage[, sentences, lang]) |
Converts from a Passage object to tokenized strings. |
xml2passage (filename) |
Classes¶
EdgeTags |
Layer 1 Edge tags. |
JSONDecodeError (msg, doc, pos) |
Subclass of ValueError with the following additional properties: |
SiteCfg |
Contains static configuration for conversion to/from the site XML. |
SiteUtil |
Contains utility functions for converting to/from the site XML. |
SiteXMLUnknownElement |
|
attrgetter |
attrgetter(attr, …) –> attrgetter object |
defaultdict |
defaultdict(default_factory[, …]) –> dict with default factory |
groupby (iterable[, key]) |
keys and groups from the iterable. |
itemgetter |
itemgetter(item, …) –> itemgetter object |
repeat (object [,times]) |
for the specified number of times. |
Class Inheritance Diagram¶

ucca.core Module¶
This module encapsulate the basic elements of the UCCA annotation.
A UCCA annotation is practically a directed acyclic graph (DAG), which
represents a Passage
of text and its annotation. The annotation itself
is divided into Layer
objects, where in each layer Node
objects
are connected between themselves and to Nodes in other layers using
Edge
objects.
Functions¶
edge_id_orderkey (edge) |
Key function which sorts Edges by its IDs (using id_orderkey() ). |
id_orderkey (node) |
Key function which sorts by layer (string), then by unique ID (int). |
Classes¶
Category (tag[, slot, layer, parent]) |
when considering refinement layers, each edge can have multiple tags sorted in a certain hierarchy. |
DuplicateIdError |
Exception raised when trying to add an element with an existing ID. |
Edge (root, parent, child[, tag, attrib]) |
Labeled edge between two Node objects in UCCA annotation graph. |
FrozenPassageError |
Exception raised when trying to modify a frozen Passage . |
Layer (ID, root[, attrib, orderkey]) |
Group of similar Node objects in UCCA annotation graph. |
MissingNodeError |
Exception raised when trying to access a non-existent Node . |
ModifyPassage (fn) |
Decorator for changing a Passage or any member of it. |
Node (ID, root, tag[, attrib, orderkey]) |
Labeled Node in UCCA annotation graph. |
Passage (ID[, attrib]) |
An annotated text with UCCA annotation graph. |
UCCAError |
Base class for all UCCA package exceptions. |
UnimplementedMethodError |
Exception raised when trying to call a not-yet-implemented method. |
Class Inheritance Diagram¶

ucca.diffutil Module¶
Functions¶
diff_passages (true_passage, pred_passage[, …]) |
Debug method to print missing or mistaken attributes, nodes and edges |
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
ucca.evaluation Module¶
The evaluation library for UCCA layer 1. v1.4 2016-12-25: move common Fs to root before evaluation 2017-01-04: flatten centers, do not add 1 (for root) to mutual 2017-01-16: fix bug in moving common Fs 2018-04-12: exclude punctuation nodes regardless of edge tag 2018-12-11: fix another bug in moving common Fs 2019-01-22: support multiple categories per edge 2019-11-29: evaluate implicit nodes too (by their parent’s yield)
Functions¶
create_passage_yields (p, *args[, tags]) |
|
||
evaluate (guessed, ref[, converter, verbose, …]) |
Compare two passages and return requested diagnostics and scores, possibly printing them too. | ||
expand_equivalents (tag_set) |
Returns a set of all the tags in the tag set or those equivalent to them :param tag_set: set of tags (strings) to expand | ||
get_by_names ([names]) |
|||
get_text (p, positions) |
|||
get_yield (unit) |
|||
move_functions (p1, p2) |
Move any common Fs to the root | ||
print_tags_and_text (p, yield_tags) |
Classes¶
Counter (**kwds) |
Dict subclass for counting hashable items. |
EdgeTags |
Layer 1 Edge tags. |
Evaluator (verbose, constructions, units, …) |
|
EvaluatorResults (results[, default]) |
|
NodeTags |
Layer 1 Node tags. |
OrderedDict |
Dictionary that remembers insertion order |
Scores (evaluator_results[, name, …]) |
|
SummaryStatistics (num_matches, …[, errors]) |
|
attrgetter |
attrgetter(attr, …) –> attrgetter object |
groupby (iterable[, key]) |
keys and groups from the iterable. |
Class Inheritance Diagram¶

ucca.ioutil Module¶
Input/output utility functions for UCCA scripts.
Functions¶
contextmanager (func) |
@contextmanager decorator. | ||||
external_write_mode (*args, **kwargs) |
|||||
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to | ||||
from_text (text[, passage_id, tokenized, …]) |
Converts from tokenized strings to a Passage object. | ||||
gen_files (files_and_dirs) |
|
||||
get_passages (filename_patterns, **kwargs) |
|||||
get_passages_with_progress_bar (filename_patterns) |
|||||
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. | ||||
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) | ||||
read_files_and_dirs (files_and_dirs[, …]) |
|
||||
resolve_patterns (filename_patterns) |
|||||
split2segments (passage, is_sentences[, …]) |
Split passage to sub-passages :param passage: Passage object :param is_sentences: if True, split to sentences; otherwise, paragraphs :param remarks: Whether to add remarks with original node IDs :param lang: language to use for sentence splitting model :param ids: optional iterable of ids to set passage IDs for each split :return: sequence of passages | ||||
to_text (passage[, sentences, lang]) |
Converts from a Passage object to tokenized strings. | ||||
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
Classes¶
LazyLoadedPassages (files[, sentences, …]) |
Iterable interface to Passage objects that loads files on-the-go and can be iterated more than once |
ParseError |
|
Passage (ID[, attrib]) |
An annotated text with UCCA annotation graph. |
chain |
chain(*iterables) –> chain object |
defaultdict |
defaultdict(default_factory[, …]) –> dict with default factory |
filterfalse |
filterfalse(function or None, sequence) –> filterfalse object |
tqdm ([iterable, desc, total, leave, file, …]) |
Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested. |
Class Inheritance Diagram¶

ucca.layer0 Module¶
Encapsulates all word and punctuation symbols layer.
Layer 0 is the basic layer for all the UCCA annotation, as it includes the
actual words and punctuation marks found in the core
.Passage.
Layer 0 has only one type of node, Terminal
. This is a subtype of
core
.Node, and can have one of two tags: Word or Punctuation.
Classes¶
Layer0 (root[, attrib]) |
Represents the Terminal objects layer. |
NodeTags |
|
Terminal (ID, root, tag[, attrib, orderkey]) |
Layer 0 Node type, represents a word or a punctuation mark. |
Class Inheritance Diagram¶

ucca.layer1 Module¶
Describes the foundational level elements (layer 1) of the UCCA annotation.
Layer 1 is the foundational layer of UCCA, whose Nodes and Edges represent scene objects and relations. The basic building blocks of this layer are the FNode, which is a participant in a scene relation (including the relation itself), and the various Edges between these Nodes, which represent the type of relation between the Nodes.
Classes¶
EdgeTags |
Layer 1 Edge tags. |
FoundationalNode (ID, root, tag[, attrib, …]) |
The basic building block of UCCA annotation, represents semantic units. |
Layer1 (root[, attrib, orderkey]) |
|
Linkage (ID, root, tag[, attrib, orderkey]) |
A Linkage between parallel scenes. |
MissingRelationError |
Exception raised when a required edge is not present. |
NodeTags |
Layer 1 Node tags. |
PunctNode (ID, root, tag[, attrib, orderkey]) |
Encapsulates punctuation layer0 .Terminal objects. |
Class Inheritance Diagram¶

ucca.normalization Module¶
Functions¶
attach_punct (l0, l1) |
|
attach_terminals (l0, l1) |
|
copy_edge (edge[, parent, child, tag, attrib]) |
|
destroy (node_or_edge) |
|
detach_punct (l1) |
|
flatten_centers (node) |
Whenever there are Cs inside Cs, remove the external C. |
flatten_functions (node) |
Whenever there is an F as an only child, remove it. |
flatten_participants (node) |
Whenever there is an A as an only child, remove it. |
flatten_scenes (node) |
Whenever there is an H with H inside, remove the top one |
fparent (node_or_edge) |
|
lowest_common_ancestor (*nodes) |
|
move_elements (node, tags, parent_tags[, forward]) |
|
move_scene_elements (node) |
|
move_sub_scene_elements (node) |
|
nearest_parent (l0, *terminals) |
|
nearest_word (l0, position, step) |
|
normalize (passage[, extra]) |
|
normalize_node (node, l1, extra) |
|
reattach_punct (l0, l1) |
|
reattach_terminals (l0, l1) |
|
remove (parent, child) |
|
remove_unmarked_implicits (node) |
|
replace_center (edge) |
|
replace_edge_tags (node) |
|
separate_scenes (node, l1[, top_level]) |
|
split_coordinated_main_rel (node, l1) |
|
traverse_up_centers (node) |
ucca.textutil Module¶
Utility functions for UCCA package.
Functions¶
annotate (passage, *args, **kwargs) |
Run spaCy pipeline on the given passage, unless already annotated :param passage: Passage object, whose layer 0 nodes will be added entries in the `extra’ dict |
annotate_all (passages[, replace, as_array, …]) |
Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input) |
annotate_as_tuples (passages[, replace, …]) |
|
break2paragraphs (passage[, return_terminals]) |
Breaks into paragraphs according to the annotation. |
break2sentences (passage[, lang]) |
Breaks paragraphs into sentences according to the annotation. |
contextmanager (func) |
@contextmanager decorator. |
external_write_mode (*args, **kwargs) |
|
extract_terminals (p) |
returns an iterator of the terminals of the passage p |
get_lang (passage_context) |
|
get_nlp ([lang]) |
Load spaCy model for a given language, determined by `models’ dict or by MODEL_ENV_VAR |
get_tokenizer ([tokenized, lang]) |
|
get_vocab ([vocab, lang]) |
|
get_word_vectors ([dim, size, filename, vocab]) |
Get word vectors from spaCy model or from text file :param dim: dimension to trim vectors to (default: keep original) :param size: maximum number of vectors to load (default: all) :param filename: text file to load vectors from (default: from spaCy model) :param vocab: instead of strings, look up keys of returned dict in vocab (use lang str, e.g. |
indent_xml (xml_as_string) |
Indents a string of XML-like objects. |
is_annotated (passage[, as_array, as_extra]) |
Whether the passage is already annotated or only partially annotated |
load_spacy_model (model) |
|
read_word_vectors (dim, size, filename) |
Read word vectors from text file, with an optional first row indicating size and dimension :param dim: dimension to trim vectors to :param size: maximum number of vectors to load :param filename: text file to load vectors from :return: generator: first element is (#vectors, #dims); and all the rest are (word [string], vector [NumPy array]) |
set_docs (annotated, as_array, as_extra, …) |
Given spaCy annotations, set values in layer0.extra per paragraph if as_array=True, and in Terminal.extra if as_extra=True |
to_annotate (passage_contexts, replace[, …]) |
Filter passages to get only those that require annotation; split to paragraphs and return generator of (list of tokens, (paragraph index, list of Terminals, Passage) + original context appended) tuples |
Classes¶
Attr |
Wrapper for spaCy Attr, determining order for saving in layer0.extra per token when as_array=True |
Enum |
Generic enumeration. |
OrderedDict |
Dictionary that remembers insertion order |
attrgetter |
attrgetter(attr, …) –> attrgetter object |
deque |
deque([iterable[, maxlen]]) –> deque object |
groupby (iterable[, key]) |
keys and groups from the iterable. |
islice |
islice(iterable, stop) –> islice object islice(iterable, start, stop[, step]) –> islice object |
itemgetter |
itemgetter(item, …) –> itemgetter object |
tqdm ([iterable, desc, total, leave, file, …]) |
Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested. |
Class Inheritance Diagram¶

ucca.validation Module¶
Functions¶
join (items) |
|
tag_to_edge (edges) |
|
validate (passage[, linkage, multigraph]) |
|
warning (msg, *args, **kwargs) |
Log a message with severity ‘WARNING’ on the root logger. |
Classes¶
ETags |
alias of ucca.layer1.EdgeTags |
L0Tags |
alias of ucca.layer0.NodeTags |
L1Tags |
alias of ucca.layer1.NodeTags |
NodeValidator (node) |
|
attrgetter |
attrgetter(attr, …) –> attrgetter object |
groupby (iterable[, key]) |
keys and groups from the iterable. |
Class Inheritance Diagram¶

ucca.visualization Module¶
Functions¶
draw (passage[, node_ids]) |
|||
node_label (node) |
|||
standoff (p) |
Visualize to Standoff .ann format, which can be presented with brat :param p: Passage :return: string in Standoff format | ||
tex_escape (text) |
|
||
tikz (p[, indent, node_ids]) |
Visualize to TikZ format :param p: Passage :param indent: indentation size or None for no indentation :param node_ids: whether to include node IDs :return: string in TikZ format | ||
topological_layout (passage) |
Scripts Documentation¶
scripts.annotate Module¶
Functions¶
annotate_all (passages[, replace, as_array, …]) |
Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input) |
get_passages_with_progress_bar (filename_patterns) |
|
is_annotated (passage[, as_array, as_extra]) |
Whether the passage is already annotated or only partially annotated |
main (args) |
|
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.convert_1_0_to_1_2 Module¶
Functions¶
annotate_all (passages[, replace, as_array, …]) |
Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input) |
convert_passage (passage, report_writer) |
|
copy_edge (edge[, parent, child, tag, attrib]) |
|
destroy (node_or_edge) |
|
extract_aux (terminal, parent, grandparent) |
|
extract_ground (terminal, parent, grandparent) |
|
extract_modal (terminal, parent, grandparent) |
|
extract_relator (terminal, parent, grandparent) |
|
extract_that (terminal, parent, grandparent) |
|
fix_punct (terminal, parent, grandparent) |
|
fix_root_terminal_child (terminal, parent, …) |
|
fix_unary_participant (terminal, parent, …) |
|
flag_relator_starts_main_relation (terminal, …) |
|
flag_suspected_secondary (terminal, parent, …) |
|
fparent (node_or_edge) |
|
get_annotation (terminal, attr) |
|
get_passages_with_progress_bar (filename_patterns) |
|
is_main_relation (node) |
|
main (args) |
|
move_node (node, new_parent[, tag]) |
|
remove (parent, child) |
|
set_light_verb_function (terminal, parent, …) |
|
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.convert_2_0_to_1_2 Module¶
Functions¶
convert_passage (passage, report_writer) |
|
copy_edge (edge[, parent, child, tag, attrib]) |
|
destroy (node_or_edge) |
|
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
replace_time_and_quantifier (edge) |
|
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.count_parents_children Module¶
Functions¶
clip (l, m) |
|
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
plot_histogram (counter, label[, plot]) |
|
plot_pie (counter, label[, plot]) |
scripts.evaluate_db Module¶
The evaluation software for UCCA layer 1.
scripts.evaluate_standard Module¶
The evaluation script for UCCA layer 1.
Functions¶
check_args (args) |
|
main (args) |
|
match_by_id (guessed, ref) |
|
print_f1 (result, eval_type) |
|
summarize (args, results, eval_type) |
scripts.find_constructions Module¶
scripts.fix_tokenization Module¶
Functions¶
context (i, terminals) |
|
create_token_element (state, text, is_punctuation) |
|
create_unit_element (state, text, tag) |
|
decode_special_chars (tokens) |
|
expand_to_neighboring_punct (i, is_puncts) |
>>> expand_to_neighboring_punct(0, [False, True, True])
|
false_indices (l) |
|
fix_tokenization (passage, words_set, lang, cw) |
|
from_site (elem) |
Converts site XML structure to core .Passage object. |
get_parents (paragraph, elements) |
|
get_passages_with_progress_bar (filename_patterns) |
|
get_tokenizer ([tokenized, lang]) |
|
handle_words_set (rule, i, terminals, …) |
use set of words to determine the right fix needed |
insert_punct (insert_index, …) |
|
insert_retokenized (terminal, …) |
|
insert_retokenized_currency (i, terminals, …) |
|
insert_spaces (tokens) |
|
is_punct (text) |
|
main (args) |
|
normalize (passage[, extra]) |
|
read_dict (file) |
|
retokenize (i, start, end, terminals, …) |
|
split_apostrophe_to_units (i, terminals, …) |
split token with apostrophe to Elaborator and Center. |
split_apostrophe_unanalyzable (i, terminals, …) |
Split apostrophe as unanalyzable. |
split_hyphen_to_units (i, terminals, …) |
split token with hyphen to two different units. |
split_hyphen_unanalyzable (i, terminals, …) |
split token with hyphens to unanalyzable tokens. |
split_possessive_s_to_units (i, terminals, …) |
split possessive s to two different units. |
split_possessive_s_unanalyzable (i, …) |
split possessive s as unanalyzable. |
strip_context (new_context, old_context, …) |
>>> strip_context(["I", "'ve", "done"], ["I", "'ve", "done"], 1, 1)
|
to_site (passage) |
Converts a passage to the site XML format. |
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
Classes¶
Element |
|
SiteCfg |
Contains static configuration for conversion to/from the site XML. |
SiteUtil |
Contains utility functions for converting to/from the site XML. |
State () |
Class Inheritance Diagram¶

scripts.join_passages Module¶
Functions¶
get_passages (filename_patterns, **kwargs) |
|
main (args) |
|
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.load_word_vectors Module¶
Functions¶
get_word_vectors ([dim, size, filename, vocab]) |
Get word vectors from spaCy model or from text file :param dim: dimension to trim vectors to (default: keep original) :param size: maximum number of vectors to load (default: all) :param filename: text file to load vectors from (default: from spaCy model) :param vocab: instead of strings, look up keys of returned dict in vocab (use lang str, e.g. |
main (args) |
scripts.normalize Module¶
scripts.pickle_to_standard Module¶
Functions¶
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
main (args) |
|
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.replace_tokens_by_dict Module¶
Functions¶
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (args) |
|
read_dictionary_from_file (filename) |
scripts.site_pickle_to_standard Module¶
Functions¶
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (args) |
|
pickle_site2passage (filename) |
Opens a pickle file containing XML in UCCA site format and returns its parsed Passage object |
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.site_to_standard Module¶
Functions¶
check_illegal_combinations (args) |
|
db2passage (handle, pid, user) |
Gets the annotation of user to pid from the DB handle - returns a passage |
fromstring (text[, parser]) |
Parse XML document from string constant. |
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (args) |
|
site2passage (filename) |
Opens a file and returns its parsed Passage object |
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.site_to_text Module¶
Functions¶
db2passage (handle, pid, user) |
Gets the annotation of user to pid from the DB handle - returns a passage |
fromstring (text[, parser]) |
Parse XML document from string constant. |
main (args) |
|
site2passage (filename) |
Opens a file and returns its parsed Passage object |
scripts.split_corpus Module¶
Functions¶
copy (src, dest[, link]) |
|
copyfile (src, dst, *[, follow_symlinks]) |
Copy data from src to dst. |
main (args) |
|
not_split_dir (filename) |
|
numeric (s) |
|
split_passages (directory, train, dev, link) |
scripts.standard_to_pickle Module¶
Functions¶
external_write_mode (*args, **kwargs) |
|
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
main (args) |
|
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.standard_to_sentences Module¶
Functions¶
external_write_mode (*args, **kwargs) |
|
extract_terminals (p) |
returns an iterator of the terminals of the passage p |
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
normalize (passage[, extra]) |
|
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
split2sentences (passage[, remarks, lang, ids]) |
|
split_passage (passage, ends[, remarks, ids, …]) |
Split the passage on the given terminal positions :param passage: passage to split :param ends: sequence of positions at which the split passages will end :param remarks: add original node ID as remarks to the new nodes :param ids: optional iterable of ids, the same length as ends, to set passage IDs for each split :param suffix_format: in case ids is None, use this format for the running index suffix :param suffix_start: in case ids is None, use this starting index for the running index suffix :return: sequence of passages |
warning (msg, *args, **kwargs) |
Log a message with severity ‘WARNING’ on the root logger. |
Class Inheritance Diagram¶

scripts.standard_to_site Module¶
scripts.standard_to_text Module¶
Functions¶
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
get_passages_with_progress_bar (filename_patterns) |
|
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (args) |
|
numeric (x) |
|
to_text (passage[, sentences, lang]) |
Converts from a Passage object to tokenized strings. |
write_text (passage, f, sentences, lang[, …]) |
scripts.unique_roles Module¶
scripts.validate Module¶
Functions¶
Pool |
Returns a process pool object |
check_args (parser, args) |
|
external_write_mode (*args, **kwargs) |
|
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
normalize (passage[, extra]) |
|
print_errors (passage_id, errors[, id_len]) |
|
validate (passage[, linkage, multigraph]) |
Class Inheritance Diagram¶

scripts.visualize Module¶
Functions¶
external_write_mode (*args, **kwargs) |
|
get_passages (filename_patterns, **kwargs) |
|
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
print_text (args, text, suffix) |
|
split2sentences (passage[, remarks, lang, ids]) |
UCCA DB Documentation¶
ucca_db.api Module¶
Functions¶
external_write_mode (*args, **kwargs) |
|
fromstring (text) |
|
fromstring_xml (text[, parser]) |
Parse XML document from string constant. |
get_by_xids (host_name, db_name, xids, **kwargs) |
Returns the passages that correspond to xids (which is a list of them) |
get_connection (db_name, host_name) |
connects to the db and host, returns a connection object |
get_cursor (host_name, db_name) |
create a cursor to the search path |
get_most_recent_passage_by_uid (uid, …[, …]) |
|
get_most_recent_xids (host_name, db_name, …) |
Returns the most recent xids of the given username. |
get_passage (host_name, db_name, pid) |
Returns the passages with the given id numbers |
get_predicates (host_name, db_name[, …]) |
Returns a list of all the predicates in the UCCA corpus. |
get_uid (host_name, db_name, username) |
Returns the uid matching the given username. |
get_xml_trees (host_name, db_name, pid[, …]) |
Params: db, host, paragraph id, the list of usernames wanted, Optional: graceful: True if no excpetions are to be raised excpetion raised if a user did not submit an annotation for the passage returns a list of xml roots elements |
get_xmls_by_username (host_name, db_name, …) |
|
linkage_type (u) |
Returns the type of the primary linkage the scene participates in. |
main (argv) |
|
print_passages_to_file (host_name, db_name, paids) |
Returns for that user a list of submitted passages and a list of assigned but not submitted passages. |
tostring (element[, encoding, method, …]) |
Generate string representation of XML element. |
unit_length (u) |
Returns the number of terminals (excluding remote units and punctuations) that are descendants of the unit u. |
write_to_db (host_name, db_name, xml, …[, …]) |
ucca_db.download Module¶
Functions¶
external_write_mode (*args, **kwargs) |
|
get_by_method (method, id_field[, passage_id]) |
|
get_by_xids (host_name, db_name, xids, **kwargs) |
Returns the passages that correspond to xids (which is a list of them) |
get_most_recent_passage_by_uid (uid, …[, …]) |
|
main (args) |
|
tostring (element[, encoding, method, …]) |
Generate string representation of XML element. |
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
ucca_db.upload Module¶
Functions¶
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
tostring (element[, encoding, method, …]) |
Generate string representation of XML element. |
upload_passage (xml_root[, site_filename, …]) |
|
write_to_db (host_name, db_name, xml, …[, …]) |
UCCA-App API Documentation¶
uccaapp.api Module¶
Classes¶
ServerAccessor (server_address, email, password) |
Class Inheritance Diagram¶

uccaapp.convert_and_evaluate Module¶
Functions¶
evaluate (guessed, ref[, converter, verbose, …]) |
Compare two passages and return requested diagnostics and scores, possibly printing them too. | ||||
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. | ||||
main (filenames, write, **kwargs) |
|||||
read_files_and_dirs (files_and_dirs[, …]) |
|
uccaapp.copy_categories Module¶
Functions¶
add_arguments (argparser) |
|
main (args) |
uccaapp.create_annotation_tasks Module¶
Classes¶
AnnotationTaskCreator ([project_id]) |
|
ServerAccessor (server_address, email, password) |
|
tqdm ([iterable, desc, total, leave, file, …]) |
Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested. |
Class Inheritance Diagram¶

uccaapp.create_tokenization_tasks Module¶
Classes¶
AnnotationTaskCreator ([project_id]) |
|
ServerAccessor (server_address, email, password) |
|
TokenizationTaskCreator (project_id, **kwargs) |
Class Inheritance Diagram¶

uccaapp.download_task Module¶
Functions¶
from_json (lines, *args[, …]) |
Convert text (or dict) in UCCA-App JSON format to a Passage object. |
main (**kwargs) |
|
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
Classes¶
ServerAccessor (server_address, email, password) |
|
TaskDownloader (**kwargs) |
|
tqdm ([iterable, desc, total, leave, file, …]) |
Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested. |
Class Inheritance Diagram¶

uccaapp.upload_conllu_passages Module¶
Functions¶
from_text (text[, passage_id, tokenized, …]) |
Converts from tokenized strings to a Passage object. |
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (**kwargs) |
|
to_json (passage, *args[, return_dict, …]) |
Convert a Passage object to text (or dict) in UCCA-App JSON :param passage: the Passage object to convert :param return_dict: whether to return dict rather than list of lines :param tok_task: either None (to do tokenization too), or a completed tokenization task dict with token IDs, or True, to indicate that the function should do only tokenization and not annotation :param all_categories: list of category dicts so that IDs can be added, if available - otherwise names are used :param skip_category_mapping: if False, translate edge tag abbreviations to category names; if True, don’t :return: list of lines in JSON format if return_dict=False, or task dict if True |
Classes¶
ConlluPassageUploader (user_id, …) |
|
JSONDecodeError (msg, doc, pos) |
Subclass of ValueError with the following additional properties: |
ServerAccessor (server_address, email, password) |
Class Inheritance Diagram¶

uccaapp.upload_streussel_passages Module¶
Functions¶
from_text (text[, passage_id, tokenized, …]) |
Converts from tokenized strings to a Passage object. |
main (**kwargs) |
|
to_json (passage, *args[, return_dict, …]) |
Convert a Passage object to text (or dict) in UCCA-App JSON :param passage: the Passage object to convert :param return_dict: whether to return dict rather than list of lines :param tok_task: either None (to do tokenization too), or a completed tokenization task dict with token IDs, or True, to indicate that the function should do only tokenization and not annotation :param all_categories: list of category dicts so that IDs can be added, if available - otherwise names are used :param skip_category_mapping: if False, translate edge tag abbreviations to category names; if True, don’t :return: list of lines in JSON format if return_dict=False, or task dict if True |
Classes¶
ServerAccessor (server_address, email, password) |
|
StreusselPassageUploader (user_id, source_id, …) |
Class Inheritance Diagram¶

uccaapp.upload_task Module¶
Functions¶
get_passages_with_progress_bar (filename_patterns) |
|
main (**kwargs) |
|
to_json (passage, *args[, return_dict, …]) |
Convert a Passage object to text (or dict) in UCCA-App JSON :param passage: the Passage object to convert :param return_dict: whether to return dict rather than list of lines :param tok_task: either None (to do tokenization too), or a completed tokenization task dict with token IDs, or True, to indicate that the function should do only tokenization and not annotation :param all_categories: list of category dicts so that IDs can be added, if available - otherwise names are used :param skip_category_mapping: if False, translate edge tag abbreviations to category names; if True, don’t :return: list of lines in JSON format if return_dict=False, or task dict if True |
to_text (passage[, sentences, lang]) |
Converts from a Passage object to tokenized strings. |
Classes¶
HTTPError (*args, **kwargs) |
An HTTP error occurred. |
JSONDecodeError (msg, doc, pos) |
Subclass of ValueError with the following additional properties: |
ServerAccessor (server_address, email, password) |
|
TaskUploader (user_id, source_id, project_id, …) |
Class Inheritance Diagram¶
