Curl
According to the official website.
curl is a command line tool for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos...), file transfer resume, proxy tunneling and a busload of other useful tricks.
Curl is not only a commandline program but is also integrated into other languages like a library. For example in php. Php has an curl extension that lends all features of the curl program to php as a programmable api. There are few functions to learn and many options to know about, and then any php program can use curl to do many wonderful things. And this is precisely what we shall be doing in this article, to learn how to use curl.
To give a brief description about what it can do, curl can be used to download contents of remote urls, download remote files, submit forms automatically from scripts etc. Although these are the most common uses of the curl library in php, curl is no limited to these things itself and can do a lot more as specified in the definition above.
That much being for the introduction, lets get into it without any more delay.
Make GET requests - fetch a url
Fetching a remote url is the same as performing a GET request on a url. This action gets the html contents of the url. Lets have a look at the program that does that and understand its working.
//gets the data from a URL function get_url($url) { $ch = curl_init(); if($ch === false) { die('Failed to create curl object'); } $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $data = curl_exec($ch); curl_close($ch); return $data; } echo get_url('http://www.apple.com/');
Open that script in your browser and it should show the contents of apple.com as expected. The get_url function takes a url as parameter and fetches its content using curl functions. The curl function is in the following sequence :
1. create a curl object using curl_init() function. If the curl extension is not installed, then this will return false and php would throw a fatal error saying "Call to undefined function curl_init()" 2. Set the necessary curl options/parameter using the curl_setopt function. 3. Execute the curl request by calling curl_exec 4. Close the curl object by calling curl_close
Step 2 is the most important where the correct options need to be provided to curl so that it can perform the request properly and fetch the results. Lets take a look at some of the basic options used in most requests.
CURLOPT_URL - This is the url to which the request is being send. In case of post requests it is the url to which the data is being submitted or posted.
CURLOPT_RETURNTRANSFER - This will return the result output as a string. Without this the output would be directly echoed to the screen or STDOUT.
So in case of a GET request the URL is the most important option to set and the RETURNTRANSFER option returns the output in a proper variable.
For a list of all options that php-curl supports check the following page
http://php.net/manual/en/function.curl-setopt.php
Note
The above GET request to a url can be done in a much simpler way like this
// Make a HTTP GET request and print it (requires allow_url_fopen to be enabled) echo file_get_contents('http://www.apple.com/');
The file_get_contents function can be used to fetch the contents of a url very much like it does for local files on the storage. So in most cases you may want to use this shorter method for fetching urls, instead of the lengthy curl call.
Setting all curl options at once
Calling the curl_setopt function again and again to set the options is a bit tedious. There is a useful function called curl_setopt_array that takes an array of options and sets them all at once. Here is a quick example
curl_setopt_array($ch, array( CURLOPT_URL => $url , CURLOPT_RETURNTRANSFER => 1, CURLOPT_CONNECTTIMEOUT => $timeout , ));
Make POST requests - submitting forms
Now that we have learned how to make basic GET requests using CURL, its time to make some POST requests. POST requests are mostly use to submit data to a url, like forms. For example the login form, signup form that you see on other websites do a POST request to submit the data.
Lets take a simple program as an example
/** POST request in PHP using Curl */ $url = 'http://localhost/curl_submit.php'; $ch = curl_init(); if($ch === false) { die('Failed to create curl object'); } curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_POST, true); // The submitted form data, encoded as query-string-style name-value pairs $post_data = 'name=Harry&age=25'; curl_setopt ($ch, CURLOPT_POSTFIELDS, $post_data); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true); $output = curl_exec($ch); curl_close ($ch); echo $output;
The above program submits the data 'name=Harry&age=25' to the $url. Test the submission by creating a curl_submit.php in localhost directory and doing a print_r($_POST). It will show what data was submitted.
The important options to set in a POST request are POST=true and POSTFIELDS=post_data. The first option tells curl that we want to make a POST HTTP request and not a GET request. The second parameter provides the data for the POST request.
Array syntax for post data
The variable $post_data can also be an array which makes it easier and safer to construct. Here is a quick example
$post_data = array('name' => 'Harry', 'age' => '25'); curl_setopt ($ch, CURLOPT_POSTFIELDS, $post_data);
Now curl will automatically do the escaping of the parameters. However the above approach has a limitation that it fails for multi-dimensional arrays.
Using cookies with curl - automated logins to remote website
Cookies allow a server to store data on the client (curl program in this case) so that the client will send back the data. This is useful in things like authentication of client, storing some session data etc. Cookies just contain data in name => value pairs, much like a php array.
Being able to authenticate on a website through curl means that the curl script can actually "LOGIN" to a remote site as well. This is not difficult at all and requires just 3-4 lines of extra code. Ofcourse the username and password would have to be there in the script too. Lets take an example
//gets the data from a URL function get_url($url) { $ch = curl_init(); if($ch === false) { die('Failed to create curl object'); } $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt'); curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); $data = curl_exec($ch); curl_close($ch); return $data; } echo get_url('https://www.google.co.in/');
The above script fetches the url "https://www.google.co.in" and this url sends some cookies to save. The cookie specific lines are COOKIEFILE and COOKIEJAR. COOKIEJAR specifies the file where the cookie data should be saved. The COOKIEFILE is the file from which the cookie data should be read to send in the next request. In this case, both are the same.
The cookie data is saved in cookie.txt which is located in the current working directory while running the script.
# Netscape HTTP Cookie File # http://curl.haxx.se/docs/http-cookies.html # This file was generated by libcurl! Edit at your own risk. .google.com TRUE / FALSE 1427285380 PREF ID=9096c8e7cd9f72f9:FF=0:TM=1364213380:LM=1364213380:S=js4r67txFLwOo3xg #HttpOnly_.google.com TRUE / FALSE 1380024580 NID 67=bJYY_PSeVt7MO2FR3AVEhFAKb-M75LpQg0yjjLmF_liHpl2LlelgazjET0hfQ7966hNIB_utS8Ve0NmQPcaENhGlhgO9ByQdEuOfBI8oGgLnQZLUSjOesDqoKI4Ywqj7 .google.co.in TRUE / FALSE 1427285396 PREF ID=8440fee987c20fa2:FF=0:TM=1364213396:LM=1364213396:S=yvcIPIyHKxKRFOYp #HttpOnly_.google.co.in TRUE / FALSE 1380024596 NID 67=UKFY159Qyt12345Xfitpo1j-GirhZ-UFyFNxyTAEEUGnNYfFNBkjjAEgBsNvHJeICUE_oLlTe9cd09O0EwdpngmZyxGhllXzZArJnQ2yB1ly1SoDe5S0gWVRt6V34MyD
If you need to save the cookie file in the same directory as the script then use the __FILE__ magic constant which has the full path to the currently executing script.
define('HOME' , dirname(__FILE__)); .... curl_setopt($ch, CURLOPT_COOKIEFILE, HOME. '/cookie.txt'); curl_setopt($ch, CURLOPT_COOKIEJAR, HOME . '/cookie.txt');
By doing the above you can ensure the location of the cookie files and dont have to hunt everywhere.
Check out an earlier post which explains how to do remote login with curl in php.
Downloading a remote file using curl
Just like the contents of a remote url can be fetched, a remote file with a given url can be downloaded and saved to local storage too.
/** Download remote file in php using curl Files larger that php memory will result in corrupted data */ $url = 'http://localhost/sugar.zip'; $path = '/var/www/lemon.zip'; $ch = curl_init($url); if($ch === false) { die('Failed to create curl handle'); } curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $data = curl_exec($ch); file_put_contents($path, $data); echo 'File download complete'; curl_close($ch);
The above program can download remote files but has few restrictions. If the download file size is larger than the total amount of memory available to php, then either a memory exceeded error would be thrown or the downloaded file would be corrupt.
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 60527991 bytes) in /var/www/curl.php on line 19
Hence the problem has to be fixed as shown in the next code example
/** Download remote file in php using curl chunking with fopen */ $url = 'http://localhost/bang.zip'; $path = '/var/www/lemon.zip'; $ch = curl_init($url); if($ch === false) { die('Failed to create curl handle'); } $fp = fopen($path, 'w'); $ch = curl_init($url); curl_setopt($ch, CURLOPT_FILE, $fp); $data = curl_exec($ch); curl_close($ch); fclose($fp);
The above code would download large files and save them without any problem. The option CURLOPT_FILE tells curl to write the output to a file.
Using proxy with curl - anonymous browsing
Curl also supports using a proxy server to perform http requests. In this example we are going to use the TOR proxy to do anonymous browsing with CURL.
Socks proxy
The following piece of code demonstrates how curl can be configured to use a socks5 proxy (TOR in this case).
//gets the data from a URL function get_url($url , $proxy = false) { $ch = curl_init(); if($ch === false) { die('Failed to create curl object'); } $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); //Set the tor proxy - the tor proxy is running at localhost port 9050 if($proxy === true) { curl_setopt($ch, CURLOPT_PROXY, 'localhost:9050'); curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5); } $data = curl_exec($ch); curl_close($ch); return $data; } echo get_url('http://www.ipmango.com/' , true);
Opening the above php code in browser should load ipmango.com via the TOR proxy and ip it will show should be different from your real public ip address.
The important curl options to set are PROXY and PROXYTYPE. The first one is the address of the proxy server and the second one specifies the type of the proxy. By default the proxy type is HTTP, and has to be changed to socks.
Http proxy
Just like we used SOCKS5 proxy, we can use an http proxy as well. Here we are going to use privoxy+tor.
//gets the data from a URL function get_url($url , $proxy = false) { $ch = curl_init(); if($ch === false) { die('Failed to create curl object'); } $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); //Set the tor proxy - privoxy is running at localhost port 8118 if($proxy === true) { curl_setopt($ch, CURLOPT_PROXY, 'localhost:8118'); } $data = curl_exec($ch); curl_close($ch); return $data; } echo get_url('http://www.ipmango.com/' , true);
Like previously this time too the program would fetch ipmango.com via the proxy. This time only the PROXY option has been set, since the PROXYTYPE option by default is http.
is anyone here, live?
Nice. but i don’t understand in real life programming where do we need it ?
specially posting data ? Can’t we just go to that website and fill form ?
Can we show output on browser while downloading a large file? I mean the progress of the download process.
Thank you for these clear and well formated examples!
I am having trouble trying to make a POST request to a external server to load the next page. The starting page has:
you would normally click on, with the corresponding script:
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
Setting line 17 of the “POST request” example to
$post_data = ‘theForm.__EVENTTARGET.value=_ctl0$_ctl0$MainContent$MainContent$btnNext’;
or
$post_data = ‘theForm.__EVENTTARGET.value=_ctl0$_ctl0$MainContent$MainContent$btnNext&theForm.__EVENTARGUMENT.value=””‘;
only produces a blank screen.
the post_data does not appear to be correct.
first make the post request directly from your browser on the target website, and use Dom Inspector > Network tab to find out the correct post field name and post field value. Also note the post target url.
Then use the url and post data to construct the curl post query in php.
Thank you for a super fast reply, and kind assistance!
The only post field name and value corresponds to the product section id. I fond no page reference at all, even when looking at cookies. But this is not my normal field of expertise… Under Posting Data I see:
__Eventtarget | _ctl0$_ctl0$MainContent$MainContent$btnNext
as well as __VIEWSTATE and __EVENTVALIDATION with long md5 type encoded textstrings.
Plus a bunch of other variables relating to the product items shown.
Maybe I could kindly ask you for a quick glance at http://www.4sound.se/section.aspx?id=1025001
It is the “[Nästa Sida >]” a tag on the top of the page I want to trigger. Kindly let me know if I this is way out of my league or if it should be rather easy to accomplish.
Thank you again for your most appreciated help!
thanks for pointing out. added the missing content in the “using cookies” section
I do not see a section that says using cookies. In fact the article for me cuts off after the post section.
“The important options to set in a POST request are POST=true and POSTFIELDS=post_data. The first option tells curl that we want to make a P” <– Cuts off here regardless of what browser I view the article in.
there was an error.
please check now.