Probabilistic Graphical Modeling / Network Data Analysis


Developed an analysis model that can be applied to complex link structures through Transfer Path Analysis(TPA) in suspension link units.


Latent graph structure pooling with a hierarchical graph context representation.

In graph data analysis, particularly the graph classification task, a discriminative graph-level representation is significant to improve classification performance. For the performance improvement, recent studies have applied pooling methods to graph neural networks.
However, the existing pooling approaches lose the initial graph structural information when incorporating each node. When a latent structure obtained from the pooling operation is given, the nodes in each latent structure have a different significance compared with the original graph. This structural information discrepancy between initial and latent structures leads to an inadequate graph representation when the existing methods generate the graph result. Motivated by this, we study propose a latent graph structure pooling with a hierarchical graph context representation.

The context attention module emphasized relatively significant nodes in the given graph. The original graph was sampled from the IMDB-BINARY dataset, and the latent structure was extracted from the original graph. Following the visualization result, a contextual difference was observed between the original graph and latent structure. In the original graph level, a few nodes were represented as commonly significant and fixed in each context block. In contrast, the latent level was highly focused on the two nodes in the latent context. Even though the latent structure had the same connections as the original, it emphasized different contextual information.


Graph Embedding in Non-Euclidean Space

Objective

Depending on the domain of the data, there are different appropriate metrics, such as the Euclidean distance, the cosine similarity, and the geodesic distance. In other words, it is not necessary for the embedding space to be Euclidean space. Despite the fact that data from various areas are represented as graph-structured data, non-Euclidean space is not effortlessly considered as a common embedding space in graph embedding.

Data

We compare the reciprocal with each space on our experimental benchmark, including citation, molecule, and relation networks.

Fellbaum, C. (Ed.), 1998. WordNet: an electronic lexical database. MIT Press.

Related Work

Dobson, D.P., Doig, J.A., 2003. Distinguishing enzyme structures from non-enzymes without alignments. Journal of Molecular Biology , 771–783.

Namata, G., London, B., Getoor, L., Huang, B., 2012. Query-driven active surveying for collective classification, in: ICML Workshop on MLG.

McCallum, A., Nigam, K., Rennie, J., Seymore, K., 2000. Automating the construction of internet portals with machine learning. Inf. Retr. 3, 127–163. URL: https://doi.org/10.1023/A:1009953814988, doi:10.1023/A:1009953814988.

Borgwardt, K.M., Kriegel, H.P., 2005b. Shortest-path kernels on graphs, in: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM 2005), IEEE Computer Society, Washington, DC, USA. pp. 74–81. URL: http://dx.doi.org/10.1109/ICDM.2005. 132.

Proposed Method

We generalize graph embedding in a certain space at the graph-level as well as at the node-level. Also, we compare the reciprocals with each space on the benchmark commonly used and suggest directions by which to determine the embedding space when confronted with certain types of data. From the experimental results, we contend that graph-level embedding in non-Euclidean space is superior to embedding in Euclidean space.

 Data is mainly represented in Euclidean space for a variety of reasons. One of the main reasons is that Euclidean space is more intuitive than other spaces. Linear algebra and vector structures were also researched based on Euclidean space. As a result, the forms of the distance and inner product, the tools of measurement we commonly use, are mostly defined in Euclidean space. Due to this strong familiarity associated with Euclidean space, Euclidean geometry research has been active in machine learning.

 A hyperbolic space of dimensions is a complex structure compared to Euclidean space. Therefore, in order to solve the optimization problem through a simpler approach, we suggest the use of a sphere, referred to as Poincaré ball. The Poincaré ball is a conformal structure of hyperbolic space. As in the case of the sphere, we can use a first-order approximation of the exponential map, called a retraction.

 The unit sphere is selected primarily when applying optimization problems in the spherical space. Similar to the Poincaré ball, the unit sphere can apply a first-order approximation method to use retraction instead of the exponential map.


Artificial Intelligence Research for Monomer Design

[2021-08-01 ~ 2023-07-31, Samyang]

Research Goal: to develop an artificial intelligence technology to design polymer with desired properties and create a structure-property database.

(1) Created a property database by collecting polymer data from public databases and literature.

  • Analyzed properties of each database and developed an auto-collecting tool for easy and fast collection. The auto-collecting tool can collect data that is updated later.

(2) Developed an artificial intelligence technology to analyze the relationship between molecule structures and properties.

  • Developed a new line notation for organic compounds and created a dataset that is required to train the artificial intelligence model.

(3) Explored the latent space to find a relationship between molecule structures and properties.

 

  • Exploring latent space that represents correlations within properties allows property analysis with low computational costs.
  • Attempted to apply VAE (Variational Auto Encoder)

(4) Generative Model-based Structure Prediction

  • Developed an artificial intelligence technology that predicts molecule structures with desired properties.