Link Checker: Find broken links in a Web site

Recommend this page to a friend!

All requests

Link Checker

Request new recommendation

Featured requests

No recommendations

Link Checker #broken links checker

Edit

by Stephen Johns - 7 years ago (2017-08-28)

Find broken links in a Web site

+5	I need a way to find broken links in a Web site.

1 Clarification request
1. by Alekos Psimikakis - 7 years ago (2017-09-25) Reply
The description of your need is ambiguous. I believe you mean 'dead linksâ'. OK, but how do you want to use this 'finder'?

If you just need to check if a link exists, google <php check if a link exists> (w/o quotes). There are plenty of examples. Use the following, it's good: "How can I check if a URL exists via PHP? - Stack Overflow" stackoverflow.com / questions / 2280394 / how-can-i-check-if-a-url-exists-via-php

Ask clarification

5 Recommendations

PHP Get HTTP Status Code from URL: Access a page and return the HTTP status code

This class can access a page and return the HTTP status code.

It sends a HTTP request to a page with a given URL and retrieves the response.

The class returns the server response status code number so it is possible to determine if the page is available or not.

by Jason Olson package author 110 - 6 years ago (2019-03-05) Comment

This class can be used to take a given URL and return the HTTP status code of the page, for example 404 for page not found, or 200 for found, or 301 for redirect, etc. It's not certain if you're looking to simply test a database/list of specific URLs or if you're looking to crawl a page/site looking for bad links. If you're looking to crawl it would be helpful to also know if you're looking for internal bad links or external links.

Very simple page details: Parse and extract Web page information details

This class can parse and extract Web page information details.

It can retrieve a Web page from a given URL and parse it to extract details like:

- Page title
- Page head and body
- Meta tags
- Character set
- Links expanded to full path
- Images
- Page headers from H1 through H6
- Internal and external links checking if they are broken
- Page elements by class or id value

by zinsou A.A.E.Mo�se package author 6835 - 7 years ago (2017-09-16) Comment

you can try this ...it has a static method to check if any given url is a broken link and it has 3 other methods to get all brokens internal links,all broken externals link,or simply all internal and external broken link of a given web page or local file...The package has many other method to get more details about a given page...

1 Comment
5. by Mutale Mulenga - 4 years ago (2021-01-20) in reply to comment 4 by zinsou A.A.E.Mo�se Reply
Your code has given me a very big leap in my efforts to add services to my clients. Thank you very much.

PHP Link Checker: Extract and check links on a page

This class can extract and check links on a page.

It can retrieve the contents of a page with a given URL and extracts the links it contains.

The class can check if the links the page contains point to valid pages.

The results are outputted to a given output stream.

+3	by Maik Greubel package author 185 - 7 years ago (2017-09-16) Comment You can try this package, it will check all anchor links on a given site for existance (http status 200)

PHP CURL Component: Compose and execute HTTP requests with Curl

This package can compose and execute HTTP requests with Curl.

It provides a fluent interface to define several parameters of a HTTP request to be sent to a given URL using the Curl library.

Currently it provides means to define the request URL, request method (POST, GET, DELETE, PATCH and PUT), request parameter values, timeout values.

Other calls can tell the package to execute the request and retrieve the response.

+2	by Fernando 70 - 7 years ago (2017-08-30) Comment I do not think there is a package to handle that. It's basically send a request with the links and analyze the response. Use Curl to accomplish that.

PHP HTTP protocol client: HTTP client to access Web site pages

Class that implements requests to Web resources using the HTTP protocol.

It features:

- May submit HTTP requests with any method, to any page, to any server, connecting to any port.
- Provides support to setup connection and request arguments from a given URL.
- May submit requests via a proxy server with support for authentication if necessary.
- May establish connections via a SOCKS server.
- Supports HTTP direct access or proxy based authentication mechanisms via SASL class library like HTTP Basic, HTTP Digest or NTLM (Windows or Samba).
- Support secure connections (https) via Curl library with SSL support, or at least PHP 4.3.0 with OpenSSL support, or via a non-SSL HTTP proxy server.
- Supports accessing secure pages using SSL certificates and private keys using Curl library
- Supports user defined request headers.
- Supports POST requests with a user defined array of form values.
- Supports POST requests with a user defined request bodies for instance for making requests to SOAP services.
- Supports streaming requests that require uploading large amounts of data of undefined length in small chunks to avoid exceeding PHP memory limits
- Supports requests to sites hosting virtual Web servers.
- Retrieves the HTTP response headers and body data separately.
- Support HTTP 1.1 chunked content encoding
- Supports session and persistent cookies.
- Provides optional handling of redirected pages.
- Supports defining connection and data transfer timeout values.
- Can output connection debug information in plain text or formatted as HTML.
- An add-on class is provided to login to Yahoo sites and perform actions on the behalf of the logged users like exporting the user address book or sending invitation to a group.

by Dave Smith 7620 - 7 years ago (2017-08-28) Comment

It is a multi-part process. First you need to scrape the website and retrieve the links, which is fairly easy. Then you can use this class to send http requests to the linked sites and capture the response to check if they are returning a good request.

3 Comments
1. by Melanie Wehowski - 7 years ago (2017-08-30) Reply
I agree with Dave Smith to recommend https://www.phpclasses.org/package/3-PHP-HTTP-client-to-access-Web-site-pages.html for testing the http response code, you can fetch only the headers and check for the response code? To do the first task, fetching the links, I would recommend:
- either php.net/manual/de/class.domdocument.php
- or (handling invalid HTML) simplehtmldom.sourceforge.net/
- or just a simple REGEX:
  
  $regexp = "<a\s[^>]href=(\"??)([^\" >]?)\\1[^>]>(.)<\/a>"; preg_match_all("/$regexp/siU", $this->content, $matches);
2. by Melanie Wehowski - 7 years ago (2017-08-30) in reply to comment 1 by Melanie Wehowski Reply
Somehow the regex in my answer was broken by the site, here it is as gist gist.github.com/wehowski/afc811cb4eb727e97e2a75b1b9d3e3c6
3. by Axel Hahn - 7 years ago (2017-10-06) Reply
I agree this too :-)

For a single webpage you can fetch it (with curl), then parse it (with DOM or regex) to get all links (can be in tags a, iframe, img, link, style, source, ...) and then check these.

To check a complete website you need a bit more, because you don't want to check each link only once, keep all results in a database. This cannot (should not) do a single class.

I currently write my own crawler and ressource checker with web browser interface, but it is still beta (and not linked in my phpclasses projects yet).

Recommend package

About us

Advertise on this site

For more information send a message to info at phpclasses dot org.