Visualization
Warning
The visualization module is currently in development. At present, only cluster_heatmap_plot is considered stable. Other functions may change in future releases.
- cluster_heatmap_plot(df, x, y, max_width=75)[source]
Create a heatmap visualization of Likert scale responses grouped by clusters.
This function generates an interactive Altair visualization showing the distribution of positive and negative responses across different clusters for each question. The visualization consists of two parts: 1. A bar chart showing the number of respondents in each cluster 2. A heatmap showing the sentiment distribution for each question by cluster
- Parameters:
df (pd.DataFrame) – The DataFrame containing the clustered data and encoded Likert responses. Should include a cluster column and encoded Likert columns.
x (str) – The name of the column containing cluster IDs (e.g., ‘question_cluster_id’).
y (List[str]) – List of column names containing the encoded Likert responses. These should typically be columns with values -1, 0, 1 representing negative, neutral, and positive responses.
max_width (int, default=75) – Maximum width for wrapping question labels in the visualization.
- Returns:
An Altair chart object combining a bar chart of cluster sizes and a heatmap of sentiment distribution that can be displayed in a Jupyter notebook or exported as HTML.
- Return type:
alt.VConcatChart
Notes
The function color-codes the heatmap cells based on the percentage of positive and negative responses, with green representing positive sentiment, red representing negative sentiment, and varying shades for mixed responses.
The encoded Likert columns (y parameter) should contain values that are encoded as: * 1 for positive responses * 0 for neutral responses * -1 for negative responses
Examples
>>> # Assuming df has been processed with cluster_questions >>> likert_columns = [f"likert_encoded_{q}" for q in questions] >>> heatmap = cluster_heatmap_plot(df, x="question_cluster_id", y=likert_columns) >>> display(heatmap)