The specific merit and necessity of the proxy server are as follows.
- Security can be improved by preventing direct access from malicious users
- By concentrating the Internet access within the company to the proxy server, it is possible to centrally manage log information such as who is watching what site and when.
- The contents of the site browsed by the user are cached on a disk such as the HDD of the proxy server, and when the second and subsequent users try to see the same site, by responding from the cache , Internet communication amount can be reduced and efficiency can be achieved
In general, simply speaking of a proxy server, it refers to the one that enters between the client and the Web server and communicates with the Web server in place of the client, but in reality the proxy is not limited to it, but various proxy servers such as FTP, Radius, SIP It is used in applications (however, the mechanism is not definitely the same).
In this article, we will refer to the http proxy if we simply call it a proxy, only the http proxy will be explained.
The specific flow of the proxy server is as follows.
As an important point, as described in 2 & 3, proxy communication is the first in DNS and proxy order. That is, the proxy server does name resolution by DNS. So, in an environment using a proxy, changing the DNS server setting on the client side has no effect.
And, let's compare the contents of "1. client's http request & 4. proxy http request". It hardly changes, but it differs between the following two points.
1. TCP port number
Although you can use 80, you often use ports such as 8080 and 3128.
2. The URI of the GET cometoid
For example, when viewing the site of "http://www.example.com/news/index.html", it is described as follows.
/ News immediately after GET is a URI.
When this is a proxy connection, it changes as follows.
Proxy connection of https communication
You can not use GET method to proxy server when communicating with https.First of all, TLS used in https is a premise that " communication contents should not be known except for two terminals at the end of communication ".Proxy server is no exception.The contents of the communication between the client and the Web server must not be visible to the proxy server either.
As mentioned above, the proxy server needs to rewrite the GET method, but since you can not see the contents of https encryption, it can not be rewritten.So for https communication we use the CONNECT method instead of the http GET method.
After receiving this, the proxy server returns the http message of "HTTP / 1.1 200 Connection Established" to the client, and thereafter, with respect to the communication of this source IP: source port number combination, the contents of TLS and https communication It will only change the IP header and TCP header and transfer it to the web server without mentioning it at all.
Proxy access log for https communication
In https communication, the GET method which should be acquired as an access log originally can not be seen because it is encrypted with TLS.Therefore, we look at the CONNECT method.
However, unlike the GET method, the CONNECT method does not have the file to be acquired, so the log can only leave the FQDN.In other words, in the case of http, you can leave the file name up to the file name, but in the case of https, you can not leave which file you accessed in the log, leaving only the host you accessed in the log.
[For http] GET http: // www.example.com/news/index.html
[For https] CONNECT www.example.com : 443
Intercepting https communication by proxy server
Basically, the proxy server can not look into the contents of https communication as described above.However, there are only two exceptions if you can never peek at https communication.
The first is to have the secret key of the server you want to communicate with https to the proxy server.But this is realistically impossible to get a secret key at a server on the Internet such as google.com (secret key can be obtained = encryption function of that server can easily be removed and intercepted).Such a configuration is possible as long as it has a configuration such as using an in-house proxy for accessing an in-house server from an in-house PC.
The second method is to have a server certificate (for example, common name = proxy.hoge.com) for anything on the proxy server and present the server certificate to the client if https communication comes.This allows you to decrypt once with a proxy server.
Also, when using a proxy server, you normally put the proxy settings in the browser, but if you introduce " transparent proxy ", you can use the proxy server without configuring the browser.