trurl: Command-line tool for URL Parsing and Manipulation URL

Daniel Stenberg has created a new tool for the open source community that allows for easy parsing and manipulation of URLs.

trurl is a simple command that can be used to parse and manipulate the URL, which is originally designed for shell scripting to make it easy to work with URLs from scripts or within the terminal.

It supports various operations, such as extracting specific components of a URL, modifying or appending parameters, decoding URLs, and extracting specific URL components from the URL, like scheme, user, password, options, host, port, path, query, fragment, and zoneid.

On top of that, you can provide input from the file or from the regular STDIN, and it also allows you to extract output in JSON format.

Under the hood, it uses the same libcurl URL parser that is used in the curl command line tool for parsing URLs.

So before getting ahead, you need to install the libcurl4-openssl-dev or libcurl4-gnutls-dev dependency on your system for compiling and using trurl.

Get Started with Trurl

To install the required dependencies, run the following command as per your distribution:

$ sudo apt install libcurl4-openssl-dev
$ sudo yum install libcurl-devel

After installing the dependency, clone the project into your current directory and change the working directory to trurl using the following command:

$ git clone https://github.com/curl/trurl.git
$ cd trurl

Next, compile and move the trurl to the /bin directory.

$ make
$ sudo mv trurl /bin

Command Usage

Now it’s time to explore trurl by running it with the available flags/options to parse and manipulate the URLs.

The first thing we will see is how trurl will parse the unicode characters in the URL and show us the decoded URL.

$ trurl ex%61mple.com/
http://example.com/

If you have noticed, it automatically adds the http protocol to the URL.

But if you modify the url like shown below, then you can find that it adds the protocol name that you have placed before the hostname.

$ trurl smtp.example.com/
smtp://smtp.example.com/

After getting the basic usage of trurl, let’s see how you can append the path to the base url.

Append URL Component to base URL

To append path, you need to use the --append flag and use path components from the available url components, such as  url, scheme, user, password, options, host, port, path, query, fragment, and zoneid.

After getting the knowledge about available components, let’s add “newpath” to the base URL using the below command:

$ trurl example.com/ --append path="newpath"
http://example.com/newpath

Include the path inside the quote (” “), otherwise, it will add the %20 if it finds the space in the path.

If you want to append a search query, then you can use the following command to append “bar” after “foo”.

$ trurl example.com\?s=foo --append query=bar
http://example.com/?s=foo&bar

Do note that --append only works with the path and query components.

Redirect Path

Above, you saw how to append the “path” and “query” components to the URL; now you will see how you can redirect the path from the base URL by specifying the --redirect option.

$ trurl example.com/main --redirect "/about/example.html"   
http://example.com/about/example.html

Set/Modify URL Component

If you want to add or modify a specific URL component, then you can use the -s or --set option like shown below:

$ trurl https://example.com --set host="trendoceans.com"
$ trurl https://example.com --set port="8080"
$ trurl https://example.com --set fragment="test"

Extract URL Component

You can also use trurl to extract certain URL components from the respective URL using the -g or --get options, as shown below:

$ trurl --url https://example.com -g '{port}'
$ trurl --url https://example.com -g '{port} {host}'

Parse & Extract URL Component from File

If you want, you can also manipulate a list of urls by specifying the filename path where you have stored the url. This will make it much easier for you to manipulate and parse the file.

To read the URL from the file, use the --url-file or -f option and specify the path to the file containing the URLs like shown below:

$ trurl --url-file ~/Documents/list-of-url
https://example.com/
http://test.com/
http://linuxmint.com/
ftp://ftp.example.com/

If you just want the port and host name, then you can use the below code:

$ trurl --url-file ~/Documents/list-of-url -g '{port} {host}'
443 example.com
80 test.com
80 mint.com
21 ftp.example.com

Quite handy, right?

Wrap up

That’s all for this article, where you learned how to use the trurl command to parse and manipulate URLs as per your requirements.

After reading this guide, I don’t think you will need to write scripts for parsing or manipulating the URL. 

trurl is in its early stages, so you might encounter some bugs or limitations, but it is definitely worth exploring for its simplicity and ease of use.

Keep an eye on future updates and improvements to get the most out of this tool by starring the project on Github.

Leave a Reply