Welcome to cause2e’s documentation!

discovery.py

This module implements the StructureLearner class.

It is used to learn a causal graph from domain knowledge and data. The proposed procedure is as follows:

  1. Read the data from a csv or parquet file.

  2. Preprocess the data (e.g. delete or recombine variables).

  3. Pass the domain knowledge.

  4. Run a causal discovery algorithm.

  5. Check if the graph looks sensible and orient remaining undirected or misdirected edges.

  6. Check if the graph is a DAG conforming to the domain knowledge.

  7. Save the graph in various file formats.

class discovery.StructureLearner(paths, spark=None)

Main class for performing causal discovery.

paths

A cause2e.PathManager managing paths and file names.

data

A pandas.Dataframe containing the data.

transformations

A list storing all the performed preprocessing transformations.

variables

A set containing the names of all variables in the data.

continuous

A set containing the names of all continuous variables in the data.

discrete

A set containing the names of all discrete variables in the data.

knowledge

A dictionary containing domain knowledge about required or forbidden edges in the causal graph. Known quantitative effects can be included for later validation.

graph

A cause2e.Graph representing the causal graph.

add_edge(source, destination, directed=True, show=True)

Adds an edge to the causal graph.

Consider adding the desired edge to the domain knowledge and rerunning the search if you are sure that it belongs in the graph.

Parameters
  • source – A string indicating the name of the source variable. For and edge ‘a -> b’, a is the source.

  • destination – A string indicating the name of the destination variable. For and edge ‘a -> b’, b is the destination.

  • directed – Optional; A boolean indicating if the edge should be a directed one. If not, the roles of source and destination can be exchanged. Defaults to True.

  • show – Optional; A boolean indicating if the resulting graph should be displayed. Defaults to True.

add_variable(name, vals)

Adds a new variable to the data.

Parameters
  • name – A string indicating the name of the new variable.

  • vals – A column of values for the new variable.

binarize_variable(name, one_val, zero_val=None)

Transforms a variable to a binary variable.

Parameters
  • name – A string indicating the name of the target variable.

  • one_val – The value that should be translated to 1.

  • zero_val – Optional; the value that should be translated to 0. Use None if everything except for one_val should be translated to 0. Defaults to None.

combine_variables(name, input_cols, func, keep_old=True)

Combines data from existing variables into a new variable.

Parameters
  • name – A string indicating the name of the new variable.

  • input_cols – A list containing the names of the variables that are used for generating the new variable.

  • func – A function describing how the new variable is calculated from the input variables.

  • keep_old – Optional; A boolean indicating if we want to keep the input variables in our data. Defaults to True.

delete_variable(name)

Deletes a variable from the data.

Parameters

name – A string indicating the name of the target variable.

display_graph(edge_analysis=True)

Shows the causal graph.

Parameters

edge_analysis – Optional; A boolean indicating if an analysis about the influence of domain knowledge on the resulting graph should be shown.

erase_knowledge()

Erases all domain knowledge.

has_edge(source, destination, directed=True)

Checks if the causal graph contains a specific edge.

Parameters
  • source – A string indicating the name of the source variable. For and edge ‘a -> b’, a is the source.

  • destination – A string indicating the name of the destination variable. For and edge ‘a -> b’, b is the destination.

  • directed – Optional; A boolean indicating if the edge should be a directed one. If not, the roles of source and destination can be exchanged. Defaults to True.

has_node(name)

Checks if the causal graph contains a specific node.

Parameters

name – A string indicating the name of the node in question.

has_undirected_edges()

Checks if the causal graph has undirected edges.

Returns

A boolean that is True if and only if the graph has at least one undirected edge.

is_acyclic()

Checks if the causal graph is acyclic.

The graph is considered acyclic if it has no undirected edges and does not contain any directed cycles.

Returns

A boolean that is True if and only if the graph is acyclic.

Raises

AssertionError – At least one edge is undirected.

normalize_variable(name)

Replaces a variable by its z-scores.

Parameters

name – A string indicating the name of the target variable.

normalize_variables()

Replaces data for all variables by their z-scores.

print_edge_analysis()

Analyzes which part of the edges were forced by domain knowledge.

read_csv(**kwargs)

Reads data from a csv file.

read_parquet(**kwargs)

Reads data rom a parquet file.

remove_edge(source, destination, directed=True, show=True)

Removes an edge from the causal graph.

Consider adding the desired edge to the domain knowledge and rerunning the search if you are sure that it does not belong in the graph.

Parameters
  • source – A string indicating the name of the source variable. For and edge ‘a -> b’, a is the source.

  • destination – A string indicating the name of the destination variable. For and edge ‘a -> b’, b is the destination.

  • directed – Optional; A boolean indicating if the edge should be a directed one. If not, the roles of source and destination can be exchanged. Defaults to True.

  • show – Optional; A boolean indicating if the resulting graph should be displayed. Defaults to True.

rename_variable(current_name, new_name)

Renames a variable in the data.

Parameters
  • current_name – A string indicating the current name of the variable.

  • new_name – A string indicating the desired new name of the variable.

respects_knowledge()

Checks if the causal graph respects the domain knowledge.

This means that it contains all the edges that were required in the domain knowledge and none of the edges that were forbidden in the domain knowledge.

Returns

A boolean that is True if and only if the graph respects the domain knowledge.

reverse_edge(source, destination, direction_strict=False, show=True)

Reverses an edge in the causal graph.

Consider adding the desired edge to the domain knowledge and rerunning the search if you are sure that it belongs in the graph in the desired orientation.

Parameters
  • source – A string indicating the name of the source variable. For and edge ‘a -> b’, a is the source.

  • destination – A string indicating the name of the destination variable. For and edge ‘a -> b’, b is the destination.

  • direction_strict – Optional; A boolean indicating if the edge must exist in the direction ‘source -> destination’. If not, the edge ‘destination -> source’ is also detected and reversed if it exists. Defaults to False.

  • show – Optional; A boolean indicating if the resulting graph should be displayed. Defaults to True.

run_all_quick_analyses(estimand_types=['nonparametric-ate', 'nonparametric-nde', 'nonparametric-nie'], verbose=False, show_tables=True, show_heatmaps=True, show_validation=True, show_largest_effects=True, generate_pdf_report=True)

Performs all possible quick causal anlyses with preset parameters.

Parameters
  • estimand_types – A list of strings indicating the types of causal effects.

  • verbose – Optional; A boolean indicating if verbose output should be displayed for each analysis. Defaults to False.

  • show_tables – Optional; A boolean indicating if the resulting causal estimates should be displayed in tabular form. Defaults to True.

  • show_heatmaps – Optional; A boolean indicating if the resulting causal estimates should be displayed and saved in heatmap form. Defaults to True.

  • show_validation – Optional; A boolean indicating if the resulting causal estimates should be compared to previous expectations. Defaults to True.

  • show_largest_effects – Optional; A boolean indicating if the largest causal effects should be listed. Defaults to True.

  • generate_pdf_report – Optional; A boolean indicating if the causal graph, heatmaps, validations and estimates should be written to files and combined into a pdf.

Infers the causal graph from the data and domain knowledge with preset parameters.

Parameters
  • verbose – Optional; A boolean indicating if we want verbose output. Defaults to True.

  • keep_vm – A boolean indicating if we want to keep the Java VM (used by TETRAD) alive after the search. This is required to use TETRAD objects afterwards. Defaults to True.

  • show_graph – A boolean indicating if the resulting graph should be shown. Defaults to True.

  • show_graph – A boolean indicating if the resulting graph should be saved. Defaults to True.

Infers the causal graph from the data and domain knowledge.

This is where the causal discovery algorithms are invoked. Currently only algorithms from the TETRAD program are available. The algorithms are called via pycausal, which is a Python wrapper around the TETRAD program provided by the creators of the original software. It seems that superfluous arguments are ignored, meaning e.g. that passing a score does not cause problems when invoking constraint based algorithms like PC. Note that you do not need to specify a threshold for distinguish between discrete and continuous variables, since this is taken care of internally by the cause2e.searcher.

Parameters
  • algo – A string indicating the search algorithm.

  • use_knowledge – Optional; A boolean indicating if we want to use our domain knowledge (some TETRAD algorithms cannot use it). Defaults to True.

  • verbose – Optional; A boolean indicating if we want verbose output. Defaults to True.

  • keep_vm – A boolean indicating if we want to keep the Java VM (used by TETRAD) alive after the search. This is required to use TETRAD objects afterwards. Defaults to True.

  • show_graph – A boolean indicating if the resulting graph should be shown. Defaults to True.

  • show_graph – A boolean indicating if the resulting graph should be saved. Defaults to True.

  • **kwargs – Arguments that are used to further specify parameters for the search. Use show_algo_params to find out which ones need to be passed.

save_graph(file_extension, verbose=True, strict=True)

Saves the causal graph to a file.

Parameters
  • file_extension – A string indicating the desired file extension.

  • verbose – Optional; A boolean indicating if confirmation messages should be printed. Defaults to True.

  • strict – Optional; A boolean indicating if the graph must be acyclic and in accordance to the domain knowledge to allow saving. Defaults to True.

save_graphs(file_extensions=['dot', 'png', 'svg'], verbose=True, strict=True)

Saves the causal graph in various file formats.

Parameters
  • file_extensions – Optional; A list of strings indicating the desired file extensions. Defaults to [‘dot’, ‘png’, ‘svg’].

  • verbose – Optional; A boolean indicating if confirmation messages should be printed. Defaults to True.

  • strict – Optional; A boolean indicating if the graph must be acyclic and in accordance to the domain knowledge to allow saving. Defaults to True.

save_knowledge(file_extension='png', verbose=True)

Saves the knowledge graph to a file.

Parameters
  • file_extension – A string indicating the desired file extension.

  • verbose – Optional; A boolean indicating if confirmation messages should be printed. Defaults to True.

set_knowledge(edge_creator, validation_creator=None, show=True, save=True)

Sets the domain knowledge that we have a about the causal graph.

Parameters
  • edge_creator – A cause2e.knowledge.EdgeCreator that has been used to create required and forbidden edges.

  • validation_creator – Optional; A cause2e.knowledge.ValidationCreator that has been used to create a dictionary containing expected quantitative causal effects. These are evaluated after estimation of the effects. Defaults to None.

  • show – Optional; A boolean indicating if information about the passed knowledge should be displayed. Defaults to True.

  • show – Optional; A boolean indicating if information about the passed knowledge should be saved to a png. Defaults to True.

show_algo_info(algo_name)

Shows information about a selected algorithm from the TETRAD program.

Parameters

algo_name – A string indicating the name of the algorithm of interest.

show_algo_params(algo_name, test_name=None, score_name=None)

Shows the parameters that are required for a causal search with the TETRAD program.

Parameters
  • algo_name – A string indicating the name of the algorithm of interest.

  • test_name – Optional; A string indicating the independence test that the algorithm uses. Use show_algo_info to find out if this is a necessary input. Defaults to None.

  • score_name – Optional; A string indicating the search score that the algorithm uses. Use show_algo_info to find out if this is a necessary input. Defaults to None.

show_independence_tests()

Shows all independence tests that the TETRAD program offers.

show_knowledge()

Shows all domain knowledge that is used for causal discovery.

show_search_algos()

Shows all search algorithms that the TETRAD program offers.

show_search_scores()

Shows all search scores that the TETRAD program offers.

class discovery.StructureLearnerDatabricks(paths, spark, display_width=800, display_height=400)

Main class for performing causal discovery on a Databricks cluster.

paths

A cause2e.PathManager managing paths and file names.

data

A pandas.Dataframe containing the data.

transformations

A list storing all the performed preprocessing transformations.

variables

A set containing the names of all variables in the data.

continuous

A set containing the names of all continuous variables in the data.

discrete

A set containing the names of all discrete variables in the data.

knowledge

A dictionary containing domain knowledge about required or forbidden edges in the causal graph. Known quantitative effects can be included for later validation.

graph

A cause2e.Graph representing the causal graph.

str_knowledge_graph

A string that is used to show the passed domain knowledge.

str_graph

A string that is used to show the result of the graph search.

str_report

A string that is used to show the pdf report.

display_graph(edge_analysis=True)

Shows the causal graph.

Parameters

edge_analysis – Optional; A boolean indicating if an analysis about the influence of domain knowledge on the resulting graph should be shown.

show_knowledge()

Shows all domain knowledge that is used for causal discovery.

estimator.py

This module implements the Estimator class.

It is used to estimate causal effects from the data and the causal graph. The module contains a wrapper around the core functionality of the DoWhy library and some utiliy methods for transitioning to the estimation phase after a causal discovery phase. The proposed procedure is as follows:

  1. Imitate the preprocessing steps that have been applied to the data before the causal discovery.

  2. Create a causal model from the data, the causal graph and the desired cause-effect pair.

  3. Use do-calculus to algebraically identify a statistical estimand for the desired causal effect.

  4. Estimate the estimand from the data.

  5. Check the robustness of the estimate.

For more information about steps 2-5, please refer to https://microsoft.github.io/dowhy/.

class estimator.Estimator(paths, transformations=[], validation_dict={}, spark=None)

Main class for estimating causal effects.

paths

A cause2e.PathManager managing paths and file names.

data

A pandas.Dataframe containing the data.

transformations

A list storing all the performed preprocessing transformations. Ensures that the data fits the causal graph.

variables

A set containing the names of all variables in the data.

model

A dowhy.causal_model.CausalModel that can identify, estimate and refute causal effects.

treatment

A string indicating the most recent treatment variable.

outcome

A string indicating the most recent outcome variable.

estimand_type

A string indicating the most recent type of causal effect.

estimand

A dowhy.causal_identifier.IdentifiedEstimand indicating the most recent estimand.

estimated_effect

A dowhy.causal_estimator.CausalEstimate indicating the most recent estimated effect.

robustness_info

A dowhy.causal_refuter.CausalRefutation indicating the results of the most recent robustness check.

spark

Optional; A pyspark.sql.SparkSession in case you want to use spark. Defaults to None.

add_variable(name, vals)

Adds a new variable to the data.

Parameters
  • name – A string indicating the name of the new variable.

  • vals – A column of values for the new variable.

analyze_quick_results(estimand_types, show_tables, show_heatmaps, show_validation, show_largest_effects, generate_pdf_report)

Summarizes the result of quick analyses for further analysis.

Parameters
  • estimand_types – A list of strings indicating the types of causal effects.

  • show_tables – A boolean indicating if the resulting causal estimates should be displayed in tabular form.

  • show_heatmaps – A boolean indicating if the resulting causal estimates should be displayed and saved in heatmap form.

  • show_validation – A boolean indicating if the resulting causal estimates should be compared to previous expectations.

  • show_largest_effects – A boolean indicating if the largest causal effects should be listed.

  • generate_pdf_report – A boolean indicating if the causal graph, heatmaps, validations and estimates should be written to files and combined into a pdf.

binarize_variable(name, one_val, zero_val=None)

Transforms a variable to a binary variable.

Parameters
  • name – A string indicating the name of the target variable.

  • one_val – The value that should be translated to 1.

  • zero_val – Optional; the value that should be translated to 0. Use None if everything except for one_val should be translated to 0. Defaults to None.

check_robustness(method_name, verbose=True, **kwargs)

Checks the robustness of the estimated causal effects.

Parameters
  • method_name – The name of the robustness check to be used.

  • verbose – Optional; A boolean indicating if verbose output should be displayed. Defaults to True.

  • **kwargs – Advanced parameters for the analysis. Please refer to https://microsoft.github.io/dowhy/ for more information.

combine_variables(name, input_cols, func, keep_old=True)

Combines data from existing variables into a new variable.

Parameters
  • name – A string indicating the name of the new variable.

  • input_cols – A list containing the names of the variables that are used for generating the new variable.

  • func – A function describing how the new variable is calculated from the input variables.

  • keep_old – Optional; A boolean indicating if we want to keep the input variables in our data. Defaults to True.

compare_to_noncausal_regression(input_cols, drop_cols=False)

Prints a comparison of the causal estimate to a noncausal linear regression estimate.

Parameters
  • input_cols – A set of columns to be used in the linear regression.

  • drop_cols – Optional; A boolean indicating if input_cols should indicate which columns to drop instead of which columns to use. Defaults to False.

delete_variable(name)

Deletes a variable from the data.

Parameters

name – A string indicating the name of the target variable.

erase_quick_results()

Erases stored results from quick analyses.

estimate_effect(method_name, verbose=True, **kwargs)

Estimates the causal effect from the statistical estimand and the data.

Parameters
  • method_name – The name of the estimation method to be used.

  • verbose – Optional; A boolean indicating if verbose output should be displayed. Defaults to True.

  • **kwargs – Advanced parameters for the analysis. Please refer to https://microsoft.github.io/dowhy/ for more information.

generate_pdf_report(dpi=(300, 300))

Generates a pdf report with the causal graph and all results.

Parameters

dpi – Optional; A pair indicating the resolution. Defaults to (300, 300).

get_quick_result_estimate(treatment, outcome, estimand_type)

Returns a stored estimated effect.

Parameters
  • treatment – A string indicating the name of the treatment variable.

  • outcome – A string indicating the name of the outcome variable.

  • estimand_type – A string indicating the type of causal effect.

identify_estimand(verbose=True, **kwargs)

Algebraically identifies a statistical estimand for the causal effect from the graph.

Parameters
  • verbose – Optional; A boolean indicating if verbose output should be displayed. Defaults to True.

  • **kwargs – Advanced parameters for the analysis. Please refer to https://microsoft.github.io/dowhy/ for more information.

imitate_data_trafos(vals_list=None)

Imitates all the preprocessing steps applied before causal discovery.

Parameters

vals_list – A list containing one column of values for each ‘add_variable’ step in the transformations. Defaults to None.

initialize_model(treatment, outcome, estimand_type, **kwargs)

Initializes the causal model.

Parameters
  • treatment – A string indicating the name of the treatment variable.

  • outcome – A string indicating the name of the outcome variable.

  • estimand_type – A string indicating the type of causal effect.

  • **kwargs – Advanced parameters for the analysis. Please refer to https://microsoft.github.io/dowhy/ for more information.

normalize_variable(name)

Replaces a variable by its z-scores.

Parameters

name – A string indicating the name of the target variable.

normalize_variables()

Replaces data for all variables by their z-scores.

read_csv(**kwargs)

Reads data from a csv file.

read_parquet(**kwargs)

Reads data rom a parquet file.

rename_variable(current_name, new_name)

Renames a variable in the data.

Parameters
  • current_name – A string indicating the current name of the variable.

  • new_name – A string indicating the desired new name of the variable.

run_all_quick_analyses(estimand_types=['nonparametric-ate', 'nonparametric-nde', 'nonparametric-nie'], verbose=False, show_tables=True, show_heatmaps=True, show_validation=True, show_largest_effects=True, generate_pdf_report=True)

Performs all possible quick causal anlyses with preset parameters.

Parameters
  • estimand_types – A list of strings indicating the types of causal effects.

  • verbose – Optional; A boolean indicating if verbose output should be displayed for each analysis. Defaults to False.

  • show_tables – Optional; A boolean indicating if the resulting causal estimates should be displayed in tabular form. Defaults to True.

  • show_heatmaps – Optional; A boolean indicating if the resulting causal estimates should be displayed and saved in heatmap form. Defaults to True.

  • show_validation – Optional; A boolean indicating if the resulting causal estimates should be compared to previous expectations. Defaults to True.

  • show_largest_effects – Optional; A boolean indicating if the largest causal effects should be listed. Defaults to True.

  • generate_pdf_report – Optional; A boolean indicating if the causal graph, heatmaps, validations and estimates should be written to files and combined into a pdf.

run_multiple_quick_analyses(treatments, outcomes, estimand_types, verbose=False, show_tables=True, show_heatmaps=True, show_validation=True, show_largest_effects=True, generate_pdf_report=True)

Performs multiple quick causal analyses with preset parameters.

Parameters
  • treatments – A list of strings indicating the names of the treatment variables.

  • outcomes – A list of strings indicating the names of the outcome variables.

  • estimand_types – A list of strings indicating the types of causal effects.

  • verbose – Optional; A boolean indicating if verbose output should be displayed for each analysis. Defaults to False.

  • show_tables – Optional; A boolean indicating if the resulting causal estimates should be displayed in tabular form. Defaults to True.

  • show_heatmaps – Optional; A boolean indicating if the resulting causal estimates should be displayed and saved in heatmap form. Defaults to True.

  • show_validation – Optional; A boolean indicating if the resulting causal estimates should be compared to previous expectations. Defaults to True.

  • show_largest_effects – Optional; A boolean indicating if the largest causal effects should be listed. Defaults to True.

  • generate_pdf_report – Optional; A boolean indicating if the causal graph, heatmaps, validations and estimates should be written to files and combined into a pdf. Defaults to True.

run_quick_analysis(treatment, outcome, estimand_type, robustness_method=None, verbose=True)

Performs a quick causal analysis with preset parameters.

Parameters
  • treatment – A string indicating the name of the treatment variable.

  • outcome – A string indicating the name of the outcome variable.

  • estimand_type – A string indicating the type of causal effect.

  • robustness_method – Optional; A string indicating the robustness check to be used. Defaults to None.

  • verbose – Optional; A boolean indicating if verbose output should be displayed. Defaults to True.

Raises

KeyError – ‘estimand_type must be nonparametric-ate, nonparametric-nde or nonparametric-nie’

show_heatmaps(save=True)

Shows heatmaps for strengths of causal effects.

Parameters

save – Optional; A boolean indicating if the result should be saved to png. Defaults to True.

show_largest_effects(estimand_type, n_results=10, save=True)

Shows the largest causal effects in decreasing order.

Parameters
  • estimand_type – A string indicating the type of causal effect.

  • n_results – Optional; An integer indicating the number of effects to be shown. Defaults to 10.

  • save – Optional; A boolean indicating if the result should be saved to png. Defaults to True.

show_quick_result_methods(treatment, outcome, estimand_type)

Shows methodic information about the result of a quick analysis.

Parameters
  • treatment – A string indicating the name of the treatment variable.

  • outcome – A string indicating the name of the outcome variable.

  • estimand_type – A string indicating the type of causal effect.

show_quick_results(save=True)

Shows all results from quick analyses in tabular form.

show_validation(save=True)

Shows if selected estimated effects match previous expectations.

Parameters

save – Optional; A boolean indicating if the result should be saved to png. Defaults to True.

class estimator.EstimatorDatabricks(paths, spark, transformations=[], validation_dict={})

Main class for estimating causal effects on a Databricks cluster.

paths

A cause2e.PathManager managing paths and file names.

data

A pandas.Dataframe containing the data.

transformations

A list storing all the performed preprocessing transformations. Ensures that the data fits the causal graph.

variables

A set containing the names of all variables in the data.

model

A dowhy.causal_model.CausalModel that can identify, estimate and refute causal effects.

treatment

A string indicating the most recent treatment variable.

outcome

A string indicating the most recent outcome variable.

estimand_type

A string indicating the most recent type of causal effect.

estimand

A dowhy.causal_identifier.IdentifiedEstimand indicating the most recent estimand.

estimated_effect

A dowhy.causal_estimator.CausalEstimate indicating the most recent estimated effect.

robustness_info

A dowhy.causal_refuter.CausalRefutation indicating the results of the most recent robustness check.

spark

Optional; A pyspark.sql.SparkSession in case you want to use spark. Defaults to None.

str_report

A string that is used to show the pdf report.

generate_pdf_report(dpi=(300, 300))

Generates a pdf report with the causal graph and all results.

Parameters

dpi – Optional; A pair indicating the resolution. Defaults to (300, 300).

knowledge.py

This module handles domain knowledge representation and verification.

It transforms knowledge about the data generating process into constraints on the edges of the causal graph. It also verifies if a given causal graph respects a set of domain knowledge constraints.

class knowledge.EdgeCreator

Main class for creating required and forbidden edges from domain knowledge.

forbidden_edges

A set of edges that must not appear in the causal graph.

required_edges

A set of edges that must appear in the causal graph.

forbid_edge(source, destination)

Forbids an edge between two nodes.

Parameters
  • source – A string indicating the source node of the forbidden edge.

  • destination – A string indicating the destination node of the forbidden edge.

forbid_edges(edges)

Forbids multiple edges.

Parameters

edges – A set of edges.

forbid_edges_from_groups(group, incoming={}, outgoing={}, exceptions={})

Forbids edges between groups of variables.

Parameters
  • group – A set containing variables.

  • incoming – Optional; a set containing all variables that cannot affect variables in ‘group’. Defaults to None.

  • outgoing – Optional; a set containing all variables that cannot be affected by variables in ‘group’. Defaults to None.

  • exceptions – Optional; a set of edges that should not be forbidden even if the group structure entails it. Defaults to None.

forbid_edges_from_temporal(temporal_order)

Finds all pairs of variables such that the first variable cannot causally affect the second variable for temporal reasons.

Parameters

temporal_order – A list of variable sets indicating the temporal order in which the variables were generated. This is used to infer forbidden edges since the future cannot cause the past.

forbid_edges_within_group(group)

Forbids edges within one group of variables.

Parameters

group – A set containing variables that cannot affect each other.

forget_edges()

Forgets all the previously created edges to allow a new start.

require_edge(source, destination)

Requires an edge between two nodes.

Parameters
  • source – A string indicating the source node of the required edge.

  • destination – A string indicating the destination node of the required edge.

require_edges(edges)

Requires multiple edges.

Parameters

edges – A set of edges.

require_edges_from_groups(group, incoming={}, outgoing={}, exceptions={})

Requires edges between groups of variables.

Parameters
  • group – A set containing variables.

  • incoming – Optional; a set containing all variables that must affect variables in ‘group’. Defaults to None.

  • outgoing – Optional; a set containing all variables that must be affected by variables in ‘group’. Defaults to None.

  • exceptions – Optional; a set of edges that should not be required even if the group structure entails it. Defaults to None.

show_edges()

Shows all currently required/forbidden edges.

class knowledge.KnowledgeChecker(edges, knowledge=None)

Main class for checking that a causal graph respects constraints from domain knowledge.

existing

A set of all edges that exist in the causal graph under consideration.

forbidden

A set of all edges that contradict domain knowledge.

required

A set of all edges that must exist according to domain knowledge.

respects_forbidden()

Returns True if no forbidden edges are present, else raises Assertion error.

respects_knowledge()

Returns a boolean indicating if all domain knowledge is respected.

respects_required()

Returns True if all required edges are present, else raises Assertion error.

class knowledge.ValidationCreator

Main class for recording expectations about causal effects that are validated after estimation.

expected_effects

A dictionary containing expected quantitative causal effects. This is evaluated after estimation of the effects.

Indices and tables