All checks were successful
Build Site / Build-Site (18.x) (push) Successful in 48s
87 lines
4.8 KiB
Plaintext
87 lines
4.8 KiB
Plaintext
---
|
|
layout: "../../../layouts/InteractiveDemoLayout.astro"
|
|
title: "Visual Sitemap"
|
|
tagline: "Using Cytoscape.js to visualize all the pages of a website"
|
|
date: "January 6, 2023"
|
|
category: "Demo"
|
|
image: "https://images.unsplash.com/photo-1639322537228-f710d846310a?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1632&q=80"
|
|
imageattr: "https://unsplash.com/photos/T9rKvI3N0NM"
|
|
---
|
|
import Sitemapgraph from './sitemapgraph.svelte'
|
|
|
|
### Background
|
|
|
|
I was given a project at work to help improve the SEO of our organization's website. We quickly realized that we did not have a great overview of everything on the website. This began the search for a tool to help visualize all the pages on the website. While there are lots of expensive SEO tools we struggled to find something that would just give us a hierarchical view of pages. So if the tool doesn't exist, we need to make the tool.
|
|
|
|
### Designing the tool
|
|
|
|

|
|
|
|
I set out to make a generic site mapper that could take a url and crawl all other pages on the site. I wanted it to have a simple command line interface, with the ability to save results for later viewing, and be able to spin up an integrated web server for the visualization. I also took the opportunity to write it all in Typescript, and I mean really use it not just write Javascript in a .ts file.
|
|
|
|
The command line interface and saving a file to disk are straight forward enough. The node fs module lets us read and write to the host filesystem and [Yargs](http://yargs.js.org/) can handle the command line stuff for us. Even crawling a website wasn't too difficult. The [sitemap-generator](https://www.npmjs.com/package/sitemap-generator) project handles all the heavy lifting. The generator creates a crawler object that keeps track of every page it visits in a queue. By looking at that queue after the crawler finishes we can get json objects of every page visited and importantly for us the page that linked to it. Finally, [Express](https://expressjs.com/) framework will handle the web server bits for the visualization.
|
|
|
|
### Putting it all together
|
|
|
|
So we have a great big array of json objects that have a url and a referrer among other things. How do we take this data and turn it into something usable?
|
|
|
|
***Graphs***
|
|
|
|
My CS200 professor can rest easy, I was paying attention and did learn something. Side note: for the sake of this app, we made the decision to not care about back links. While this doesn't change the implementation of our graph it does simplify displaying the graph as we don't have to worry about loops. On to the code.
|
|
|
|
```typescript
|
|
class Vertex {
|
|
name: string;
|
|
ajd: Array<Edge>;
|
|
|
|
constructor(name: string) {
|
|
this.name = name;
|
|
this.ajd = new Array<Edge>();
|
|
}
|
|
|
|
toString() {
|
|
return this.name;
|
|
}
|
|
}
|
|
|
|
class Edge {
|
|
destination: Vertex;
|
|
|
|
constructor(vertex: Vertex) {
|
|
this.destination = vertex;
|
|
}
|
|
}
|
|
```
|
|
|
|
Each Vertex in the graph has a name and an array of edges. Each edge has a destination vertex. Easy enough. Now for the graph. We keep a map of vertex names and its corresponding vertex. If the vertex does not exist in the map we add it to the map and return the vertex. Adding an edge then becomes as simple as passing the name of the two vertices we want to connect. If both vertices don't exist they are created in the graph and we push the new edge to the source vertex adjacency array. Thats really it, there is some code omitted from the example below for printing the graph as well serializing the graph object so it can be saved to disk.
|
|
|
|
```typescript
|
|
class Graph {
|
|
vertices: Map<string, Vertex>;
|
|
|
|
constructor() {
|
|
this.vertices = new Map<string, Vertex>();
|
|
}
|
|
|
|
getVertex(name: string) {
|
|
let v = this.vertices.get(name);
|
|
if (v === undefined) {
|
|
v = new Vertex(name);
|
|
this.vertices.set(name, v);
|
|
}
|
|
return v;
|
|
}
|
|
|
|
addEdge(source: string, dest: string) {
|
|
const v = this.getVertex(source);
|
|
const w = this.getVertex(dest);
|
|
v.ajd.push(new Edge(w));
|
|
}
|
|
}
|
|
```
|
|
|
|
Now that we have a graph representation of our site we just need to convert it something a bit more visually appealing. [Cytoscape.js](https://js.cytoscape.org/) will handle that for us. I was tempted to use something like p5.js or D3 to handle the visualization but Cytoscape seemed better built for graph and network data structures. The translation from our graph data structure to Cytoscape is super easy, just pass in an array of vertices and edges.
|
|
|
|
Finally after all that crawling and parsing we are left with the desired result. A interactive hierarchical view of every page on a website. Is the tool perfect, no, but it was good enough for what we needed.
|
|
|
|
<Sitemapgraph client:load/> |