GraphML 101: Intro to Graph Machine Learning (GraphML), Graph Neural Networks (GNNs), and Large Language Models (LLMs)

Frank So
7분 전
7분 분량

In the rapidly evolving landscape of artificial intelligence, three distinct but increasingly interconnected fields are charting the future of data analysis and reasoning: Graph Machine Learning (GraphML), Graph Neural Networks (GNNs), and Large Language Models (LLMs). While LLMs have captured the public imagination with their mastery of human language, the synergy between these models and the silent revolution of Graph ML represents one of the most promising frontiers in AI research. This article serves as a foundational introduction to these pillars of modern AI, exploring how the convergence of LLMs' reasoning power with GNNs' ability to unlock insights from complex, relationship-driven data is creating unprecedented opportunities.

At a technical level, GNNs provide a powerful framework for learning representations of relational data, enabling models to understand the intricate connections within networks. LLMs, on the other hand, have demonstrated a remarkable ability to process and generate human-like text by capturing vast amounts of semantic knowledge. We will delve into why the combination of these technologies is so potent, exploring how GNNs can provide the structured, network-based reasoning that LLMs sometimes lack, and how LLMs can, in turn, enrich the features and context used in graph-based models, leading to more powerful and nuanced applications across science, technology, and business.

1. Introduction to Graphs and Graph Machine Learning (GML, GraphML)

A graph is a fundamental data structure representing "items linked by relations". These items are called nodes (or vertices), and their connections are called edges (or links). Graphs are ubiquitous and can model various real-world scenarios, including:

Social networks: Users as nodes, connections as edges.
Molecules: Atoms as nodes, chemical bonds as edges
Knowledge graphs: Entities as nodes, relationships as edges.
Citation networks: Papers/authors as nodes, citations as edges.
Text/NLP: Words/tokens as nodes, semantic relationships as edges.
3D meshes: Points as nodes, connections as edges.

Graphs can be further characterized by:

Directionality: Directed graphs have edges that flow in a specific direction (e.g., A -> B), while undirected graphs have symmetric relationships (A <-> B).
Weights: Weighted graphs associate a numerical value (weight) with each edge, indicating strength or cost.
Attributes: Nodes, edges, and the entire graph can store information in the form of scalars or embeddings.
Complexity: Multigraphs allow multiple types of edges between a pair of nodes, and hypergraphs allow an edge to connect to multiple nodes.

2. Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) are a specialized class of artificial neural networks designed to process and learn from graph-structured data. They are built upon the concept of message passing, where nodes iteratively update their representations by aggregating information from their neighbors.

2.1 Core Concepts of GNNs

Message Passing Neural Networks (MPNNs): A foundational framework for GNNs. In an MPNN, "graph nodes update their representations by aggregating the messages received from their neighbours.". This process involves:
- Message function ($\psi$): Computes messages from neighboring nodes and their connecting edges.
- Aggregation operator ($\bigoplus$): Gathers messages from all neighbors in a permutation-invariant manner (e.g., sum, mean, max).
- Update function ($\phi$): Combines the node's current features with the aggregated messages to produce a new node representation.
Stacking Layers: GNNs typically consist of multiple layers, allowing nodes to aggregate information from increasingly distant neighbors. "Stacking n MPNN layers means that one node will be able to communicate with nodes that are at most n 'hops' away". This can be viewed as operating on learned embeddings of subgraphs.
Expressive Power: This refers to a GNN's ability to distinguish between non-isomorphic graphs.

Many MPNNs are limited to the 1-Weisfeiler-Leman (1-WL) algorithm's separation power.

More expressive GNNs, such as those equivalent to k-dimensional WL algorithms (k-WL), or those using random features or subgraph counts, have been developed, often at the cost of more computational resources.

New measures of expressivity are emerging, such as "mixing," which encodes the joint and nonlinear dependence of a graph function on pairs of nodes' features, particularly in the context of over-squashing.

2.2 Types of GNN Architectures

Graph Convolutional Networks (GCNs): A popular type of GNN where node representations are updated by combining their own features with aggregated features from their neighbors, often using a normalized adjacency matrix. A limitation is that they "do not allow multidimensional edge features" directly.
GCN
Graph Attention Networks (GATs): Introduce an attention mechanism to assign varying importance (attention coefficients) to neighboring nodes when aggregating messages. This allows the model to "focus on the important information from the data instead of focusing on the whole data". Attention coefficients measure "how important is node u to node v" and are normalized via a softmax function. GCNs can be viewed as a special case of GATs where attention coefficients are fixed.
GAT
Gated Graph Sequence Neural Networks (GGS-NNs): Extend the GNN formulation by incorporating a Gated Recurrent Unit (GRU) cell for updating node representations, allowing for sequential outputs.
GGS-NN

2.3 Challenges in GNNs

Over-smoothing: Node representations can become indistinguishable after many layers. Countermeasures include skip connections, gated update rules, and jumping knowledge.
Over-squashing: "A bottleneck that is created by squeezing long-range dependencies into fixed-size representations". This occurs when tasks depend on interactions between nodes with "large commute time" and can obstruct expressive power. Modifying the final layer to be fully-adjacent can mitigate this. Graph rewiring is emerging as a valid approach to address over-squashing.
Heterophily: The issue of learning on graphs where connected nodes have dissimilar features or labels. This is an active area of research.
Generalization/Transferability: Ensuring GNNs perform well on unseen graphs or domains.
Efficiency and Scalability: Training GNNs on "enterprise-scale graphs with tens of billions of edges" requires specialized frameworks like GraphStorm, which is built on PyTorch and can leverage multiple GPUs and machines.
Sampling and Batching: Due to the variability in graph structure, creating mini-batches for training is challenging. Strategies involve sampling subgraphs that preserve essential properties.

3. Tasks and Applications of GNNs

GNNs are applied across various levels of graph analysis:

Node-level tasks: Predicting properties or roles of individual nodes.
- Examples: Node classification (e.g., classifying members in Zach's karate club based on loyalty), predicting 3D coordinates of atoms in a molecule (Alphafold).
Edge-level tasks: Predicting properties of existing edges or missing edges.
- Examples: Drug side effect prediction (predicting adverse effects between drug pairs), link prediction in recommender systems.
Graph-level tasks: Predicting properties of the entire graph.
- Examples: Molecular property prediction (e.g., toxicity, odor prediction for molecules), graph classification (e.g., image classification where images are represented as graphs of pixels).
Subgraph-level tasks: Identifying communities or predicting properties of subgraphs.
- Examples: Community detection in social networks, estimating arrival times in itinerary systems.

Key Application Domains:

Drug Discovery & Molecular Science: Designing new molecules with specific properties, predicting molecular properties (e.g., toxicity, odor, chemical reactivity), chemical reaction prediction.
Recommender Systems: Enhancing textual attributes of users/items, link prediction, social recommendation.
AI for Science: Beyond molecules, GNNs are used in materials design and predicting the evolution of physical systems.
Robot Task Planning: Leveraging LLMs and graph information to plan robot actions, especially in complex scenarios.
Cybersecurity: Detecting and tracing host-based threats, network lateral movement.
Combinatorial Optimization: Solving NP-hard problems like chip design.
Natural Language Processing (NLP): Text classification, question answering, machine translation, event extraction by leveraging graph-based text representations to capture semantic relationships.
Computer Vision: Representing images as graphs of patches to enhance feature extraction and image understanding.
Water Distribution Networks: Forecasting water demand, developing metamodels.
Social Networks: User classification, community detection.

GNNs can operate in transductive settings (predicting evolution on a single, fixed graph) or inductive settings (training/evaluating on different graphs).

4. Large Language Models (LLMs) and Graphs

The intersection of LLMs and Graph Machine Learning is a rapidly evolving field, leveraging the strengths of both paradigms.

4.1 LLMs for Graph Models

LLMs are being employed to address limitations and enhance various aspects of graph models:

Enhancing Feature Quality:
- Enhancing Feature Representation: LLMs generate interpretations or augmented attributes for graph elements (e.g., SMILES notations for molecules, textual attributes for users/items in recommender systems).
- Generating Augmented Information: LLMs can produce more detailed descriptions or reasoning (e.g., user preferences, potential user types for items).
- Aligning Feature Space: LLMs help align features from different modalities.
Solving Vanilla GNN Training Limitations: LLMs are explored for their ability to handle structural information in graphs, categorized by whether they ignore, implicitly use, or explicitly use structural information.
Addressing Heterophily and Generalization: LLMs can help alleviate challenges related to dissimilar node features and improve model generalizability.
Prompt Tuning: LLMs are used in conjunction with GNNs for prompt tuning, a technique to adapt pre-trained models to specific tasks with limited data.

4.2 Graphs for LLMs

Knowledge Graphs (KGs) are crucial for mitigating pressing challenges in LLMs, such as "factuality awareness, hallucinations, limited explainability in the reasoning process". KGs store "high-quality, human-curated factual knowledge in a structured format".

KG-enhanced LLM Pre-training: KGs inject factual knowledge into LLMs during pre-training to improve their understanding and generation capabilities.
KG-enhanced LLM Inference: KGs provide a structured source of truth during the inference stage, helping LLMs:
- Mitigate Hallucinations: By grounding LLM responses in factual data from KGs.
- Improve Explainability: By enabling LLMs to derive citation information from KGs to support their answers and provide reasoning paths. Benchmarks like KaLMA assess this capability.
- Enhance Reasoning: LLMs can perform complex reasoning tasks on KGs, such as KG completion, question answering, and reasoning itself.
A Scenario of KG

4.3 Key LLM Models and their Integration

Various LLMs, including ChatGPT, LLaMA, GPT-4, Vicuna, PaLM, BERT, and ViT, are being integrated with GNNs and KGs across diverse applications. Frameworks like GraphGPT and GraphLLM specifically utilize Graph Transformers with LLMs.

LLM KG Builder Design - Front-End (Example)

5. Future Directions

The integration of LLMs and Graph Machine Learning is still in an exploratory stage, with promising future directions:

Generalization and Transferability: Further research to enable GNNs enhanced by LLMs to generalize across different datasets and domains.
Multi-modal Graph Learning: Combining graph data with other modalities (e.g., text, images) using LLMs.
Trustworthiness: Ensuring robustness against adversarial attacks, providing explainability for decisions, promoting fairness, and maintaining privacy, especially in critical applications like healthcare and finance.
Expressiveness on Relevant Graph Classes: While most expressive GNNs aim for general graphs, there's a growing need to develop architectures tailored for specific, practically relevant graph classes (e.g., planar graphs for molecules, bipartite graphs for optimization problems).
Generative Models for Graphs: Developing models that can generate new graphs with desired properties, such as novel molecular structures for drug design.