{
"cells": [
{
"cell_type": "markdown",
"id": "f9a34f6c-7933-46c5-84c1-2a6a56c4e7ac",
"metadata": {},
"source": [
"# AEM-as-a-CloudService - CDN Cache Hit Ratio Analysis - Jupyter Notebook\n",
"\n",
"A quickstart sample Jupyter Notebook to analyze your AEMCS CDN logs. The analysis shows the **Cache HIT ratio** for the following request types and provides a list of the **top URLs of MISS and PASS cache types**.\n",
"\n",
"- Total \n",
"- HTML \n",
"- Image\n",
"- JavaScript\n",
"- CSS\n",
"- JSON\n",
"\n",
">\n",
"> **IMPORTANT::**\n",
"> Please change the value of `json_file` variable in the next block to analyze your CDN log file.\n",
">\n",
"\n",
"The CDN log file is downloaded from the 蜜豆视频 Cloud Manager. \n",
"\n",
"Feel free to edit this Jupyter Notebook file to gain further insight and optimize the CDN configurations."
]
},
{
"cell_type": "markdown",
"id": "52f3a431-d8fb-4a32-bc39-ee06fede8014",
"metadata": {},
"source": [
"## Install additional libraries "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0f931f94-7d17-44cf-b776-8f879853089c",
"metadata": {},
"outputs": [],
"source": [
"!pip install termcolor\n",
"!pip install tabulate"
]
},
{
"cell_type": "markdown",
"id": "a5f2839c-d850-4434-99a0-94a78f9a0500",
"metadata": {},
"source": [
"## Load CDN Log File"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c207e77",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import csv\n",
"import json\n",
"import re\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from termcolor import colored\n",
"\n",
"\n",
"# *\n",
"# IMPORTANT - Update the below `log_file` value with your log file name/path.\n",
"# *\n",
"log_file = 'publish_cdn_2023-09-22.log'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c8f322a",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Setting Pandas options\n",
"pd.set_option('display.max_columns', 100) \n",
"pd.set_option('display.max_colwidth', 100) \n",
"\n",
"data = []\n",
"col_names = ['timestamp', 'ttfb', 'cli_ip', 'cli_country', 'rid', 'req_ua', 'host', 'url', 'method', 'res_ctype', 'cache', 'res_age', 'status', 'pop', 'rules']\n",
"\n",
"# Patterns to ignore log lines\n",
"log_line_patterns_to_ignore = [r'(?:\\\"url\\\":\"\\/systemready\\\")', r'(?:\\\"cache\\\":\\\"SYNTH\\\")']\n",
"\n",
"# Read the log file\n",
"with open(log_file, encoding='utf-8') as in_file:\n",
" for line in in_file:\n",
" if not any(re.search(pattern, line) for pattern in log_line_patterns_to_ignore):\n",
" log = json.loads(line)\n",
" data.append(log)\n",
"\n",
"# Create DataFrame\n",
"df = pd.DataFrame(data=data, columns=col_names)\n",
"\n",
"# Display the first three rows, to review the CDN log details\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"id": "7d18e1ec-35d9-4e13-bf3d-a52daeab9038",
"metadata": {},
"source": [
"## Perform Analysis\n",
"\n",
"### Display Analysis Result for Total, HTML, JS/CSS and Image Requests\n",
"\n",
"For each request type (HTML, JS/CSS, Image) the analysis result contains\n",
"- Cache Hit Ratio\n",
"- Pie chart\n",
"- Bar chart"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eab88ab0-1630-4c75-b7c9-d6a31a0c545b",
"metadata": {},
"outputs": [],
"source": [
"# Graph related definitions\n",
"labels = ['Hit', 'Pass', 'Miss']\n",
"colors = ['green', 'blue','red'] \n",
"bar_colors = ['tab:green', 'tab:blue','tab:red']\n",
"\n",
"def analyze(total_count, miss_count, hit_count, pass_count, request_type):\n",
"\n",
" # Calculate and print the cacheRatio\n",
" cache_ratio = (hit_count*100/(total_count))\n",
" \n",
" print(\"\\n\\n\")\n",
" print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
" print(colored(f\"The {request_type} cache hit ratio is: {cache_ratio:.2f}%\", \"red\", attrs=[\"bold\"]))\n",
" print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"\n",
" #print(colored(f\"\\n \\n \\t {request_type} cache hit ratio is: {cache_ratio:.2f}% \\n\", \"red\", attrs=[\"bold\",\"underline\"]))\n",
"\n",
" displayPieChart(hit_count, pass_count, miss_count, total_count, request_type)\n",
"\n",
" displayBarGraph(hit_count, pass_count, miss_count, request_type)\n",
" \n",
"\n",
"def displayPieChart(hit_count, pass_count, miss_count, total_count, request_type):\n",
" \n",
" # Display the cache hit ratio as a pie chart\n",
" hit_percent = (hit_count*100/(total_count))\n",
" pass_percent = (pass_count*100/(total_count))\n",
" miss_percent = (miss_count*100/(total_count))\n",
"\n",
" # Data for the pie chart\n",
" sizes = [hit_percent, pass_percent, miss_percent] \n",
"\n",
" plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)\n",
"\n",
" # Add labels and title\n",
" plt.title(f'{request_type} - CDN Cache Percentage')\n",
"\n",
" # Show the plot\n",
" plt.show()\n",
"\n",
"def displayBarGraph(hit_count, pass_count, miss_count, request_type):\n",
" # Display the bar graph\n",
" fig, ax = plt.subplots()\n",
" counts = [hit_count, pass_count, miss_count]\n",
"\n",
" # Create the plot\n",
" ax.bar(labels, counts, label=labels, color=bar_colors)\n",
"\n",
" ax.set_ylabel('Count')\n",
" ax.set_xlabel('Cache-Type')\n",
" ax.set_title(f'{request_type} - CDN Cache Count')\n",
" ax.legend(title='Cache-Type color')\n",
"\n",
" # Show the plot\n",
" plt.show()\n",
"\n",
"# ***\n",
"# Note - FEEL FREE TO UPDATE BASED OO YOUR ANALYSIS REQUIREMENTS\n",
"# ***\n",
"\n",
"# Total Requests Analysis\n",
"total_count = len(df)\n",
"miss_count = (df['cache'] == 'MISS').sum()\n",
"hit_count = (df['cache'] == 'HIT').sum()\n",
"pass_count = (df['cache'] == 'PASS').sum()\n",
"\n",
"if total_count > 0:\n",
" analyze(total_count, miss_count, hit_count, pass_count, 'Total Requests')\n",
"\n",
"\n",
"# HTML Requests Analysis\n",
"total_count = (df['res_ctype'].str.startswith('text/html;')).sum()\n",
"miss_count = (df['res_ctype'].str.startswith('text/html;') & (df['cache'] == 'MISS')).sum() \n",
"hit_count = (df['res_ctype'].str.startswith('text/html;') & (df['cache'] == 'HIT')).sum() \n",
"pass_count = (df['res_ctype'].str.startswith('text/html;') & (df['cache'] == 'PASS')).sum() \n",
"\n",
"if total_count > 0:\n",
" analyze(total_count, miss_count, hit_count, pass_count, 'HTML Requests')\n",
"\n",
"\n",
"# JS & CSS Requests Analysis\n",
"total_count = (df['res_ctype'].str.startswith('application/javascript;') | df['res_ctype'].str.startswith('text/css;')).sum()\n",
"miss_count = ((df['res_ctype'].str.startswith('application/javascript;') | df['res_ctype'].str.startswith('text/css;')) & (df['cache'] == 'MISS')).sum() \n",
"hit_count = ((df['res_ctype'].str.startswith('application/javascript;') | df['res_ctype'].str.startswith('text/css;')) & (df['cache'] == 'HIT')).sum() \n",
"pass_count = ((df['res_ctype'].str.startswith('application/javascript;') | df['res_ctype'].str.startswith('text/css;')) & (df['cache'] == 'PASS')).sum() \n",
"\n",
"if total_count > 0:\n",
" analyze(total_count, miss_count, hit_count, pass_count, 'JavaScript & CSS Requests')\n",
"\n",
"\n",
"# Image Requests Analysis\n",
"total_count = (df['res_ctype'].str.startswith('image/')).sum()\n",
"miss_count = (df['res_ctype'].str.startswith('image/') & (df['cache'] == 'MISS')).sum() \n",
"hit_count = (df['res_ctype'].str.startswith('image/') & (df['cache'] == 'HIT')).sum() \n",
"pass_count = (df['res_ctype'].str.startswith('image/') & (df['cache'] == 'PASS')).sum() \n",
"\n",
"if total_count > 0:\n",
" analyze(total_count, miss_count, hit_count, pass_count, 'Image Requests')\n",
"\n",
"\n",
"# JSON Requests Analysis\n",
"total_count = (df['res_ctype'].str.startswith('application/json')).sum()\n",
"miss_count = (df['res_ctype'].str.startswith('application/json') & (df['cache'] == 'MISS')).sum() \n",
"hit_count = (df['res_ctype'].str.startswith('application/json') & (df['cache'] == 'HIT')).sum() \n",
"pass_count = (df['res_ctype'].str.startswith('application/json') & (df['cache'] == 'PASS')).sum() \n",
"\n",
"if total_count > 0:\n",
" analyze(total_count, miss_count, hit_count, pass_count, 'JSON Requests')\n"
]
},
{
"cell_type": "markdown",
"id": "1e019dcb-df7b-4b26-8300-9b3d8cf8f2df",
"metadata": {},
"source": [
"### Top 5 MISS and PASS Request URLs for HTML, JS/CSS and Image \n",
"\n",
">\n",
"> **Note::**\n",
"> This information is helpful when your Cache Hit ratio is less than 90%. \n",
">\n",
"\n",
"\n",
"For each request type (HTML, JS/CSS, Image) show the top 5 MISS, and PASS request URLs and their counts. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c22ecf6-6249-4700-9e7f-eb4413a23695",
"metadata": {},
"outputs": [],
"source": [
"from tabulate import tabulate\n",
"\n",
"def displayTopURLsAndCounts(cache_type, request_type, no_of_results):\n",
"\n",
" # Filter rows where 'cache' is equal to and 'res_ctype' starts with\n",
" filter1_df = df[df['cache'] == cache_type ]\n",
" \n",
" filter2_df = filter1_df[filter1_df['res_ctype'].str.startswith(request_type)]\n",
" \n",
" # Group by 'url' and count occurrences of 'cache' equal to 'PASS'\n",
" result = filter2_df.groupby('url')['cache'].count().reset_index()\n",
" \n",
" # Sort the result in descending order by 'cache' count\n",
" result = result.sort_values(by='cache', ascending=False)\n",
" \n",
" # Rename the columns\n",
" result.columns = ['url', 'total_count']\n",
" \n",
" # Sort the result in descending order by 'total_count'\n",
" result = result.sort_values(by='total_count', ascending=False)\n",
"\n",
" # Get the top results\n",
" top_results = result.head(no_of_results)\n",
" \n",
" if top_results is not None:\n",
" # Print the top results as a table with custom headings\n",
" table = tabulate(top_results, headers='keys', tablefmt='grid', showindex=\"never\")\n",
" print(table)\n",
" else:\n",
" print('No data found with given criteria')\n",
"\n",
"\n",
"no_of_results = 5\n",
"\n",
"\n",
"print(\"\\n\\n\")\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"print(colored(f\" Top {no_of_results} HTML Requests of \", \"red\", attrs=[\"bold\"]))\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"\n",
"cache_type = 'MISS'\n",
"print(colored(f\"\\n{cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'text/html',no_of_results)\n",
"\n",
"cache_type = 'PASS'\n",
"print(colored(f\"\\n {cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'text/html',no_of_results)\n",
"\n",
"\n",
"print(\"\\n\\n\")\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"print(colored(f\" Top {no_of_results} Image Requests of \", \"red\", attrs=[\"bold\"]))\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"\n",
"cache_type = 'MISS'\n",
"print(colored(f\"\\n{cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'image/',no_of_results)\n",
"\n",
"cache_type = 'PASS'\n",
"print(colored(f\"\\n {cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'image/',no_of_results)\n",
"\n",
"\n",
"print(\"\\n\\n\")\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"print(colored(f\" Top {no_of_results} JavaScript Requests of \", \"red\", attrs=[\"bold\"]))\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"\n",
"cache_type = 'MISS'\n",
"print(colored(f\"\\n{cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'application/javascript',no_of_results)\n",
"\n",
"cache_type = 'PASS'\n",
"print(colored(f\"\\n {cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'application/javascript',no_of_results)\n",
"\n",
"\n",
"print(\"\\n\\n\")\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"print(colored(f\" Top {no_of_results} CSS Requests of \", \"red\", attrs=[\"bold\"]))\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"\n",
"cache_type = 'MISS'\n",
"print(colored(f\"\\n{cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'text/css',no_of_results)\n",
"\n",
"cache_type = 'PASS'\n",
"print(colored(f\"\\n {cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'text/css',no_of_results)\n",
"\n",
"\n",
"print(\"\\n\\n\")\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"print(colored(f\" Top {no_of_results} JSON Requests of \", \"red\", attrs=[\"bold\"]))\n",
"print(colored(\"=================================================================================\", \"red\", attrs=[\"bold\"]))\n",
"\n",
"cache_type = 'MISS'\n",
"print(colored(f\"\\n{cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'application/json',no_of_results)\n",
"\n",
"cache_type = 'PASS'\n",
"print(colored(f\"\\n {cache_type} Cache Type \\n\", \"red\", attrs=[\"bold\"]))\n",
"displayTopURLsAndCounts(cache_type,'application/json',no_of_results)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1ba2a2c-c7b6-48dd-9805-c3b81755d30b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}