We use coordinate system to describe a geometric object.
A space is a set of objects.
Orthogonality has several good properties.
The measure of a space/basis.
A set of vectors
A set of functions
points, lines, triangles, numbers, sets, matrices, trees, graphs, relations, groups
Distance is a function $d: V \times V \rightarrow \mathbb{R}$, $V$ is a set.
$\forall x, y \in V$
$(V, d)$ is a metric space, a space equipped with distance.
A function $d: V \times V \rightarrow \mathbb{R}$, $V$ is a set.
$\forall x, y \in V$
$(V, d)$ is a pseudometric space.
A function $d: V \times V \rightarrow \mathbb{R}$, $V$ is a set.
$\forall x, y \in V$
$(V, d)$ is a quasimetric space.
Norm is a function $||\cdot||: V \rightarrow \mathbb{R}$, $V$ is a vector space.
$\forall x, y \in V$
$(V, ||\cdot||)$ is a normed vector space.
Norm is a function $||\cdot||: V \rightarrow \mathbb{R}$, $V$ is a vector space.
$\forall x, y \in V$
$(V, ||\cdot||)$ is a seminormed vector space.
$d_{ij} = ||x_i - x_j||^2, \hat{d}_{ij} = ||y_i - y_j||^2$
$X$ is a set, a distance is a function $d^p: X \times X \rightarrow \mathbb{R}$
$(X, d^p)$ is a metric space.
$(X, d^p_N)$ is a metric space.
Jaccard distance is a special case of $d^p_N$ when $p = 1$
$\mathbb{R}^k$ is a vector space, a distance is a function $d^p: \mathbb{R}^k \times \mathbb{R}^k \rightarrow \mathbb{R}$
$(\mathbb{R}^k, d^p)$ is a metric space.
Manhattan distance is a special case of $d^p$ when $p = 1$
$(\mathbb{R}^k, d^p_N)$ is a metric space.
Only when $p = 1$, they both reduce to Manhattan distance.
$L(\mathbb{R})$ is a set of integrable function, a distance is a function $d^p: L(\mathbb{R}) \times L(\mathbb{R}) \rightarrow \mathbb{R}$
$(L(\mathbb{R}), d^p)$ is a metric space.
$(L(\mathbb{R}), d^p_N)$ is a metric space.
A class of divergence. A measure on probability distribution.
$$ D_f(P||Q) = \int_{\Omega} f(\frac{dP}{dQ}) dQ $$An ontology $\mathcal{O} = (V, E)$ is a directed acyclic graph (DAG).
Consistent subgraph $F \subseteq V$, a vertex $v \in F$, then $\forall ancestor(v) \in F$
The underlying probabilistic graph model is related to Bayesian network.
$P(v \mid parents(v))$ states the probability that node $v$ is part of an ontological annotation given all its parents.
Additional information inherent to the node $v$ assuming that all its parents are already in the annotation.
The information content of the nodes in $G$ but not part of true annotation $F$.
$$ mi(F, G) = \sum_{v \in G \backslash F} ia(v) $$The information content of the nodes in $F$ but not part of prediction $G$.
$$ ru(F, G) = \sum_{v \in F \backslash G} ia(v) $$