URL Extraction after crawling web pages

2019-10-31 research API osint naver

김기철
김기철
Intelligent digital investigation tools

When crawling webpages, it is prevalent to want to find links on the web page to use as the next candidate for crawling. Below is a python script for extracting URLs from the links returned as search results for the crawler using Naver REST APIs.

import urllib.request
from bs4 import BeautifulSoup

targetUrl = input('URL: ')
r = urllib.request.urlopen(targetUrl).read()
soup = BeautifulSoup(r,'html.parser')

for a in soup.find_all('a', href=True):
    print(a['href'])