May 17, 2021

How to Extract URLs from Webpages

If you are wondering how to extract URLs from any webpage, for free, really fast you are in the right place!

I selected the best 3 methods:

Browser
Addon
Terminal

Let’s jump right into it.

1. Extract URLs from webpages with your browser:

You will need:

a browser
piece of code

Extract URLs from Webpages

Navigate to the page from which you’d like to extract links.

Right-click the page and select “Inspect”.

This will open up the console, into which you can type or copy and paste this code:

var x = document.querySelectorAll("a");
var myarray = []
for (var i=0; i<x.length; i++){
var nametext = x[i].textContent;
var cleantext = nametext.replace(/\s+/g, ' ').trim();
var cleanlink = x[i].href;
myarray.push([cleantext,cleanlink]);
};
function make_table() {
    var table = '<table><thead><th>Name</th><th>Links</th></thead><tbody>';
   for (var i=0; i<myarray.length; i++) {
            table += '<tr><td>'+ myarray[i][0] + '</td><td>'+myarray[i][1]+'</td></tr>';
    };
 
    var w = window.open("");
w.document.write(table); 
}
make_table()

2. Extract URLs from webpages with addons:

I can reccomend a really powerful Chrome addon called Link Klipper

Link-Klipper

This extension allows you to :

Extract all the links on the webpage
Store all the extracted links as a CSV file
Custom drag a selectable area on the webpage from which all the links will be extracted

3. Extract URLs from webpages with the terminal:

That´s what you´ll need to do the magic:

MAC
termnial
wget installed

First of all check if you already installed wget by running the following command:

$ wget -V

If it is not installed yet then install Homebrew first:

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Then you need to install wget:

$ brew install wget

Now you got everything you need to extract URLs from a webpage for free with only one single command:

$ wget --mirror --delete-after --no-directories https://www.the-page-you-wanna-crawl.com 2>&1 | grep '^--' | awk '{print $3}' | sort >extracted-URLs.txt

The command will export all the URLs in a txt file called extracted-URLs.txt (if you want you can also rename it).

This method was shared via Twitter by John Muller.

Have fun!

1. Extract URLs from webpages with your browser:

2. Extract URLs from webpages with addons:

3. Extract URLs from webpages with the terminal:

You should also read:

How to Extract URLs from Sitemaps

Comments