<div dir="ltr"><font size="4"><font face="georgia, serif">Noma sana! Everyone needs to be using a VPN when browsing as articulated in this article (<a href="https://qz.com/945261/how-to-get-a-personal-vpn-and-why-you-need-one-now/?utm_source=qzfb">https://qz.com/945261/how-to-get-a-personal-vpn-and-why-you-need-one-now/?utm_source=qzfb</a>) in order to guard your privacy. Below is a snapshot of a part of the article that I found useful:</font> </font><div><br></div><div><font size="4">"<span style="background-color:rgb(249,249,249);color:rgb(76,76,76);font-family:"pt serif",georgia,serif">Opera is a popular web browser that comes with some excellent privacy features, like a free built-in VPN and a free ad blocker (and as you may know, ads can spy on you).</span></font></div><div><span style="background-color:rgb(249,249,249);color:rgb(76,76,76);font-family:"pt serif",georgia,serif"><font size="4"><br></font></span></div><div><font size="4"><span style="background-color:rgb(249,249,249);color:rgb(76,76,76);font-family:"pt serif",georgia,serif">If you just want a secure way to browse the web without ISPs being able to easily snoop on you and sell your data, Opera is a great start. Let’s install and configure it real quick. This takes less than 5 minutes.</span>"</font></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 3, 2017 at 8:13 AM, Rosemary Koech-Kimwatu via kictanet <span dir="ltr"><<a href="mailto:kictanet@lists.kictanet.or.ke" target="_blank">kictanet@lists.kictanet.or.ke</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">Hi Nanjira,</p>
<p dir="ltr">This is really insightful and thought provoking. </p>
<p dir="ltr">Still digesting...</p>
<p dir="ltr">Kind regards,</p>
<p dir="ltr">Rosemary Koech-Kimwatu <br>
Advocate-FinTech and ICT Policy <br>
+254 718181644</p>
<div class="gmail_quote"><div><div class="h5">On Apr 2, 2017 10:12 PM, "Nanjira Sambuli via kictanet" <<a href="mailto:kictanet@lists.kictanet.or.ke" target="_blank">kictanet@lists.kictanet.or.ke</a><wbr>> wrote:<br type="attribution"></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="auto"><div><div class="m_1592979447833685957m_6801193013188024543original-url">Highly enjoyable, and insightful read on metadata (and lack of protections thereof) to profile you for advertising etc.</div><div class="m_1592979447833685957m_6801193013188024543original-url">In the context of no privacy laws or regulations, our ISPs know quite a bit about us, and who knows what/how that info is (ab)used...</div><div class="m_1592979447833685957m_6801193013188024543original-url"><br><a href="http://www.privacypies.org/blog/metadata/2017/02/28/hakuna-metadata-1.html" target="_blank">http://www.privacypies.org/blo<wbr>g/metadata/2017/02/28/hakuna-<wbr>metadata-1.html</a><br><br></div><div id="m_1592979447833685957m_6801193013188024543article" style="font-family:'Iowan Old Style';font-size:1.2em;line-height:1.5em;margin:0px;padding:0px" class="m_1592979447833685957m_6801193013188024543iowan m_1592979447833685957m_6801193013188024543exported">
<div class="m_1592979447833685957m_6801193013188024543page" style="text-align:start;word-wrap:break-word;max-width:100%"><h1 class="m_1592979447833685957m_6801193013188024543title" style="font-size:2em;line-height:1.2em;margin-top:0px;margin-bottom:0.5em;font-weight:400;text-align:start;display:block;max-width:100%">Hakuna Metadata (1) - Exploring the browsing history.</h1>
<p style="max-width:100%">Since I joined European Digital Rights (EDRi) in September 2016, one of most hottest topics that is being discussed in the Brussels bubble is the review of ePrivacy rules (ePR). As a complementary instrument to the General Data Protection Regulation (GDPR), ePR mainly deals with data protection and privacy in the electronic communications sector, such as the tracking of users when they browse the internet. Since the GDPR has been already finalized, advocacy around the ePR is probably the last chance to defend European citizens digital rights. One of my key responsibilities as the Ford-Mozilla <a href="https://advocacy.mozilla.org/en-US/open-web-fellows/fellows2016" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">Open Web Fellow</a> is to bring <a href="https://storyengine.io/stories/decentralization/joe-mcnamee/" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">practical understanding to policy/political debate</a>, and I agreed with Joe that I will work on the issues that needs more technical clarifications. One such blur area in the ePR happens to be “metadata and the impact on privacy”. So, this article is an explainer about the power of metadata and the reason why we need stronger policies in that context.</p>
<h2 style="font-weight:bold;font-size:1.125em;max-width:100%">What is metadata?</h2>
<p style="max-width:100%">Without getting too much into details about the technical or <a href="http://bit.ly/2lGtClu" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">EU definitions</a> of metadata, let us simply understand it as the data about the data. The table below illustrates the difference between the <strong style="max-width:100%">data and the metadata</strong>.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/hm_table1.png" alt="Table 1: Data vs Metadata" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p style="max-width:100%">Often we see that the data is considered to be sensitive and as a personal property it has to be protected. It is possible to protect your data using encryption technologies, for example GNU Privacy Gaurd (GPG) for emails. On the other hand, metadata is not treated to be very sensitive and for the same reason there are not many methods to encrypt it. It is due to the technical shortcoming of the basic building blocks of Internet Protocol (IP) stack. It does make sense to not encrypt the metadata right? Because if we encrypt the sender information on an email, your email client wouldn’t know whom to send it to.</p>
<p style="max-width:100%">When the internet protocols were built, the intention was merely to establish a communication channel to connect the world. At that point of time, there were no much threats from government spying agencies, mass surveillance programs or from the advertisers. However, today we live in a world where everything we do on the Internet is being tracked and thus putting our privacy for sale on the data market. Even though the metadata has been a gold mine for Internet Service Providers (ISPs), Telecommunication providers from past two decades, the privacy risks of the metadata started to be a debatable topic since the <a href="http://uk.businessinsider.com/nsa-document-metadata-2016-12?r=US&IR=T" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">Snowden revelations</a>. Here are some of the quotes about the power of metadata from former big shots of government spying programs.</p>
<hr class="m_1592979447833685957m_6801193013188024543clear" style="max-width:100%;clear:both;background-color:rgba(0,0,0,0.2);height:1px;border:0px">
<hr class="m_1592979447833685957m_6801193013188024543clear" style="max-width:100%;clear:both;background-color:rgba(0,0,0,0.2);height:1px;border:0px">
<p style="max-width:100%"><em style="max-width:100%">“Metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.”</em>
- <strong style="max-width:100%">Stewart Baker</strong>, Ex- NSA General Counsel</p>
<p style="max-width:100%"><em style="max-width:100%">“We kill people based on metadata.”</em>
- <strong style="max-width:100%">Michael Hayden</strong>, former director of the NSA and ex- CIA</p>
<hr class="m_1592979447833685957m_6801193013188024543clear" style="max-width:100%;clear:both;background-color:rgba(0,0,0,0.2);height:1px;border:0px">
<hr class="m_1592979447833685957m_6801193013188024543clear" style="max-width:100%;clear:both;background-color:rgba(0,0,0,0.2);height:1px;border:0px">
<p style="max-width:100%">Metadata by its virtue is not invented to help privacy invaders; instead it was intended to fasten the process of classification and indexing of any kind of bulk data, without looking at the data itself. So, by definition, metadata enforces data protection by letting someone process the data, without even looking at the content inside. However, that is also the fastest way to profile the whole internet users, right? Earlier in October 2015, Share Lab presented <a href="https://labs.rs/en/metadata/" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">this</a> piece of investigative journalism which articulates the hidden power of email metadata. Indeed, it is scary to see what one can understand about personal behavior just from the “To”, “From”, “Subject” and “Timestamp” fields. Other than the scary use-cases, there are a handful of projects such as <a href="https://guardianproject.info/2017/02/24/combating-fake-news-with-a-smartphone-proof-mode/" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">Proofmode</a> (earlier known as Informacam - <a href="https://guardianproject.info/apps/informacam/" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">CameraV</a>) which harness the power of metadata for combating against fake news. However, the number of projects which exploits that power for advertisement tracking and surveillance outbeats the genuine use cases of metadata.</p>
<h2 style="font-weight:bold;font-size:1.125em;max-width:100%">Browsing history and the potential threat actors</h2>
<p style="max-width:100%">Modern browsers such as Firefox, Google Chrome, Opera and Internet Explorer stores the browsing history to provide a user-friendly browsing experience. By default, these browsers store the history of all the previously visited websites, cached copy of the websites, form filling history, cookie information and also bookmarks. Depending on the operating system and the browser, these information will be stored in a specific location on the hard disk of your computer in a lightweight database. Some of us rant about this default nature of the browsers, as it compels users to manually opt-out of browsing history storing mechanisms and the privacy concerns associated with it. Browser history - specifically the website information and cached copy has its own advantage in terms of usability:</p>
<ol style="max-width:100%">
<li style="max-width:100%">Automatic completion/suggestion of previously visited URLs.</li>
<li style="max-width:100%">Locally cached copies of the previously visited websites to boost up the browsing speed, which is very helpful when the Internet connection is very slow.</li>
</ol>
<p style="max-width:100%">At this point, it is obvious that our browsing history is accessible to our browsers, which is why it is highly recommended to use open-source trustworthy browsers such as <a href="https://www.mozilla.org/en-US/firefox/products/" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">Mozilla Firefox</a> ,which protects and respects your privacy. Whereas if you are using other browsers from the companies which are themselves the data brokers and advertisers, you end up giving away your browsing history to get tracked. So, assuming that we trust our browsers, let us exclude it from being a threat actor in our model.</p>
<div class="m_1592979447833685957m_6801193013188024543scrollable" style="max-width:100%;overflow-x:scroll;word-wrap:normal"><table align="center" border="1" style="max-width:none;font-size:0.9em;text-align:start;word-wrap:break-word;border-collapse:collapse">
<tbody style="max-width:100%">
<tr style="max-width:100%">
<th colspan="1" rowspan="1" style="font-weight:bold;max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216);background-color:rgba(0,0,0,0.0235294)">
Entity
</th>
<th colspan="1" rowspan="1" style="font-weight:bold;max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216);background-color:rgba(0,0,0,0.0235294)">
Access to history
</th>
<th colspan="1" rowspan="1" style="font-weight:bold;max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216);background-color:rgba(0,0,0,0.0235294)">
Comments
</th>
</tr>
<tr style="max-width:100%">
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Malware in the computer</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Full</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Any program which has adequate privileges to start a browser process and browse the web potentially has the capacity to leak it. Such</span> <span style="max-width:100%"><a href="http://www.spamfighter.com/News-20261-Horrid-Piece-of-Android-Malware-Monitors-Browser-History-Texts-and-Banking-Information.htm" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">malwares</a></span><span style="max-width:100%"> have a high demand in the darknet. Other than that, there are</span> <span style="max-width:100%"><a href="https://en.wikipedia.org/wiki/Browser_hijacking&" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">browser hijacking malware</a></span><span style="max-width:100%"> which pollutes your history</span></p>
</td>
</tr>
<tr style="max-width:100%">
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Wifi Hotspot</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Full</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Using</span> <span style="max-width:100%"><a href="http://ieee-security.org/TC/SPW2016/MoST/slides/s2/t1.pdf" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">captive Wi-Fi</a></span><span style="max-width:100%"> is a common practice in many places, especially when using</span> <span style="max-width:100%"><a href="http://qurinet.ucdavis.edu/pubs/conf/Ningning_INFOCOM13.pdf" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">public hotspots</a></span><span style="max-width:100%">.</span></p>
</td>
</tr>
<tr style="max-width:100%">
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Internet Service Providers (ISPs)</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Almost full</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">ISPs can seek many insights,</span> <span style="max-width:100%"><a href="https://www.teamupturn.com/reports/2016/what-isps-can-see" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">even when the traffic is encrypted</a></span><span style="max-width:100%">. Have a look at “</span><span style="max-width:100%"><a href="https://events.ccc.de/congress/2010/Fahrplan/attachments/1791_27C3-JeroenMassar-HowTheInternetSeesYou.pdf" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">How Internet sees you</a></span><span style="max-width:100%">”</span></p>
<p style="max-width:100%"><span style="max-width:100%"></span>
</p>
<p style="max-width:100%"><span style="max-width:100%">HTTP: The ISP knows which pages you're visiting and could see the data you send and receive.</span>
</p>
<p style="max-width:100%"><span style="max-width:100%">HTTPS: The ISP knows which domain you've visited but not the URL parameters, and not the contents of any data you send or receive.</span>
</p>
</td>
</tr>
<tr style="max-width:100%">
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Domain Name Service (DNS) Providers</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Partial</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Only the domain name queries and not complete URL.</span>
</p>
<p style="max-width:100%"><span style="max-width:100%"></span>
</p>
</td>
</tr>
<tr style="max-width:100%">
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Cookies (tracking, advertising and profiling companies)</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Partial to almost full (depending on who’s cookie it is)</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Based on cookie origin policies, cookies from Website A can collect the history related to that.</span>
</p>
</td>
</tr>
<tr style="max-width:100%">
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Websites that you visit</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Partial</span>
</p>
</td>
<td colspan="1" rowspan="1" style="max-width:100%;padding:0.25em 0.5em;border:1px solid rgb(216,216,216)">
<p style="max-width:100%"><span style="max-width:100%">Any websites that you visit would obviously know that you have visited them.</span>
</p>
</td>
</tr>
</tbody>
</table></div>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Table 2: Access to browsing history</em>
</p>
<p style="max-width:100%">In spite of the clear privacy implications, there is no clarity under the law about whether browsing history (more specifically the URLs) is to be protected as content or non-content metadata. Most of the lobbyists express their dissatisfaction about the changes between leaked and the official proposals of ePR. Out of many other concerns, the most questionable parts of the ePrivacy Regulations, at least for me is the <a href="http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=41461" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">permitted usage and exceptions</a> of contents of communication. Things are a bit more complicated than that on two levels. Firstly, there are various cross-cutting issues (consent, tracking, ISPs, “value-added services”, etc…) where metadata analysis comes up. Exceptions for web analytics could imply serious privacy concerns without stronger guarantees of statistical privacy.</p>
<p style="max-width:100%">Without getting much into that debate, let us explore browsing history as it provides a rich source of metadata of our daily interactions with the internet world. For the sake of simplicity and for understanding the power of this chunk of metadata, let us assume a malicious ISP (who can completely or partially see our browsing metadata) who does not respect the privacy policies to be the threat actor.</p>
<h2 style="font-weight:bold;font-size:1.125em;max-width:100%">Analytics about the web history</h2>
<p style="max-width:100%">Based on the browsing history contained in my computer, below is a simple analytics of the website domain names that I have visited the most. Like any other “normal” internet user I have used Google as the search engine; spent ample amount of time on social media sites such as Twitter, Facebook and LinkedIn; watched videos over Youtube; used Wikipedia as the primary source of information; shopped on Amazon; sought programming help over Github and Stackoverflow; and so on.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/1_Stats_1.png" alt="Figure 1: Most visited domain names" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 1: Most visited domain names</em>
</p>
<p style="max-width:100%">According to ePR and as per the global norms of deducing useful insight of the users, ISPs can use such analytics for their survey purposes. Under genuine use cases, these kind of statistics are helpful for fine-tuning the bandwidth for specific websites that are used more by the users. Even though the top 20 websites remain the same across all parts of the world, depending on demographics and social structure of a region, the websites that will appear after the top 20 are not always the same. Your contribution to big data analytics, starts right from here - just by contributing the domain names of the websites that you have visited. The same chunk can be used for profiling you as well. May be these websites in Figure 1, is most common to all and does not really profile as you different. But, imagine some of the porn sites or your favourite political parties web page! Well, that makes you little different than others right?</p>
<p style="max-width:100%">Figure 2 belows shows the suffixes or more technically the top-level domains (TLD) of the websites that I have visited the most. In many cases TLDs represent the countries that the websites are affiliated with. Also, websites like Google change the TLDs depending on the country from which you are browsing their website. For instance, even if you typed <a href="http://www.google.com" target="_blank">www.google.com</a> from Belgium, it will be redirected to <a href="http://www.google.be" target="_blank">www.google.be</a> automatically. Based on Figure 2, one can easily tell that I have connections with Finland (.fi), Belgium (.be), India (.in and .<a href="http://co.in" target="_blank">co.in</a>) and some academic affiliation (.edu). While your ISP will obviously know these information, imagine the case when you are travelling!</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/1_Stats_2.png" alt="Figure 2: Suffix (TLDs) of most visited websites" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 2: Suffix (TLDs) of most visited websites</em>
</p>
<p style="max-width:100%">Even you are in a foreign country, you still visit websites related to your home country. So, along with the ISP of the foreign country, your geographic affiliation or affinity is now evident to the DNS providers as well. At this point, you have contributed second chunk of information to the big data and profiling to two of the entities which can collect data about you.</p>
<h2 style="font-weight:bold;font-size:1.125em;max-width:100%">Browsing patterns</h2>
<p style="max-width:100%">If the internet traffic is HTTP, everything will be transmitted in plain text. So, ISPs can see full path of the URL (<a href="http://www.facebook.com/zuck" target="_blank">http://www.facebook.com/zuck</a>)<wbr>. Whereas, when it is HTTPs only partial path is visible (<a href="https://www.facebook.com/" target="_blank">https://www.facebook.com/</a>) to the ISPs. To know more about how Internet works, refer to EDRi’s <a href="https://edri.org/papers/how-the-internet-works/" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">paper</a> on the same topic.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/2_anomies.png" alt="Figure 3: Number of unique URLs visited over time" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 3: Number of unique URLs visited over time</em>
</p>
<p style="max-width:100%">Since the full path of URL is visible to the ISPs when your traffic is not encrypted, they can start analysing your behavior online. Figure 3 represents a graph of the total number of unique websites that have visited over time based on my browsing history. As one can see, I visit 10-150 unique URLs on an average over the period of November 2015 to January 2017. Some peaks in the graph beyond this range shows a lot of anomalies in my browsing pattern. These anomalies could potentially indicate certain specific events of my life. It could be increased workload, planning my travel, searching for a job or anything that you can imagine.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/3_browsing_pattern_full.png" alt="Figure 4: Heatmap of browsing pattern - unique URLs visited over time" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 4: Heatmap of browsing pattern - unique URLs visited over time</em>
</p>
<p style="max-width:100%">Another way of looking at the browsing patterns is by plotting a heatmap of the same data i.e. the number of unique URLs visited over time as shown in Figure 4. While Figure 3 shows the anomalies in the browsing pattern, the heatmap gives a snapshot of the lifestyle in an easily understandable manner.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/4_browsing_pattern_sleeping_time_2.png" alt="Figure 5: Heatmap of browsing pattern - sleeping (idle) and leisure time" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 5: Heatmap of browsing pattern - sleeping (idle) and leisure time</em>
</p>
<p style="max-width:100%">There are consistent patterns in the lower half and the upper quarter of the graph. Even within those patterns, we can see two different sets, which depicts my work time browsing and after-work leisurely activities as it fades out from 20:00 hour onwards. In the figure 5, from 12:00 AM till 07:00 AM, there is a constant strip of dark patch which represents less activity over the internet, or in other words it is the time when I sleep.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/5_browsing_pattern_travel.png" alt="Figure 6: Heatmap of browsing pattern - travel" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 6: Heatmap of browsing pattern - travel</em>
</p>
<p style="max-width:100%">As highlighted in the figure 6, there are certain patches within the strip of my sleeping pattern. When correlated with the change in name suffixes (with reference to figure 2), it was found out to be work-related travels. In other words, I had travelled to a different timezone and continued to work from 9.00 AM to 7:00 PM as I have done on any other regular day.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/6_browsing_pattern_holiday.png" alt="Figure 7: Heatmap of browsing pattern - Holiday season" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 7: Heatmap of browsing pattern - Holiday season</em>
</p>
<p style="max-width:100%">If we zoom in the graph more (As represented in figure 7), there are patterns which show high number of browsing, a patch of almost no activities even during the regular working hours, then a sudden increase in browsing activities and finally resuming to normal working hour pattern. This depicts that I planned for my holiday (checking into flights, confirming hotel booking, etc.), took a break from work, returned from the holiday (sudden increase is possibly due to following up on emails and activities that I might have missed during my trip) and finally resuming my work.</p>
<p style="max-width:100%">So, at this point, one can know about my working hours, sleep time, work-related travel and my holiday schedules just using my browsing metadata. That is quite a lot of information about me retrieved just from the metadata right?</p>
<h2 style="font-weight:bold;font-size:1.125em;max-width:100%">Potential adwords</h2>
<p style="max-width:100%">As mentioned earlier, browsing history falls into the grey area of whether to be treated and protected as content data or as the non-content metadata. Unlike many other metadata, where it is not possible to retrieve the complete content data just by using the metadata associated with it, it is possible to retrieve all the contents of the websites that you have visited by crawling over the list of URLs from your browsing history. Whether or not it happens in reality, to avoid giving the list of URLs directly to advertisers, the ISPs can automate their analytics system to crawl over the list to seek insight on what you might have seen while browsing. By giving away just the keywords deduced from the websites that you have visited to the advertisers, the ISPs can potentially bypass the privacy laws by claiming it as anonymized.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/7_generic_word_cloud.png" alt="Figure 8: Wordcloud generated by crawling over the list of URLs from the browsing history" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 8: Wordcloud generated by crawling over the list of URLs from the browsing history</em>
</p>
<p style="max-width:100%">Figure 8 represents the wordcloud generated by crawling over the most visited websites by me. Not so surprisingly, being a security and privacy researcher, I can see those words in this cloud, along with other keywords related to my identity - both from professional and personal life. This cloud was derived by excluding all the social media and search engine related URLs.</p>
<p style="max-width:100%">Yet another buzzwords which we hear often these days is - “Data mining” and “machine learning”. Data Mining refers to seeking useful insights programmatically from the collected bulk data, whereas machine learning is to use that insight for data-driven decision making. One of the features of these methods known as <a href="https://en.wikipedia.org/wiki/Named-entity_recognition" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">Named Entity Recognition (NER)</a> which allows to classify the text into categories such as organizations, persons and locations.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/8_org_word_cloud.png" alt="Figure 9: Wordcloud generated by Name-Entity Recognition - Organizational entities" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 9: Wordcloud generated by Name-Entity Recognition - Organizational entities</em>
</p>
<p style="max-width:100%">If we run NER algorithms on the text retrieved by crawling over the list of URLs that you have visited, it provides more clarity to the keywords that can be potentially generated. Figure 9 shows the keywords related to organizational entities from the websites that I have visited. This narrows down my generic profile to target me on the keywords found in this cloud. For example, I could be a potential customer for insurance companies, University and management related jobs.</p>
<p style="max-width:100%">Further down the line, figure 10 represents the names of the people found in the websites that I have visited the most. Surprisingly, I turned out to be the self-obsessed person who visits websites of his own or the websites that talks about himself. In the Person names cloud, I can see some of my academic co-authors, role models or the people whom I follow. Imagine that there are the names of Tim Cook or Steve Jobs! I am probably a potential customer for Apple! So, the list of adwords targeted towards me could include Apple products here onwards.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/9_person_word_cloud.png" alt="Figure 10: Wordcloud generated by Name-Entity Recognition - Person names" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 10: Wordcloud generated by Name-Entity Recognition - Person names</em>
</p>
<p style="max-width:100%">How about my next travel destination? Can it be predicted from my web history? Possible yes - it could be Brazil, China or Singapore!</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/10_loc_word_cloud.png" alt="Figure 11: Wordcloud generated by Name-Entity Recognition - Location entities" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 11: Wordcloud generated by Name-Entity Recognition - Location entities</em>
</p>
<p style="max-width:100%">As the Figure 11 represents, I might have visited the websites which contained those locations which could probably be my next travel destination. Even without doing any fancy machine learning processing, I could attest that these were actually some of the places that I am planning to visit!</p>
<p style="max-width:100%">As mentioned before, if you are using HTTP, the ISPs can see the full URL path in clear text. Along with them, the websites that you visit will obviously have to know that full path to deliver you exactly what you are looking for.</p>
<p style="max-width:100%">If you have searched for “vegetarian restaurants in Brussels”in Google , your Google query URL will be <a href="https://www.google.be/search?q=vegetarian+restaurant+brussel" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">http://www.google.be/search?q=<wbr>vegetarian+restaurant+brussel</a>. Assuming that the ISPs will use the keywords you are searching to profile you again, it makes their job of deriving the adwords for your future targeted advertisements much more easier.</p>
<p style="max-width:100%"><i style="max-width:100%">Please note that Google auto-redirects HTTP traffic to HTTPS, however, for the sake of simplicity let us ignore that. Instead, there are many malicious things (like “man-in-the-middle”) that an ISPs can do in cooperation with advertisers to know more information even from the HTTPS traffic.</i></p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/11_query_stats_2.png" alt="Figure 12: Word frequency graph of Google search keywords" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 12: Word frequency graph of Google search keywords</em>
</p>
<p style="max-width:100%">Figure 12 represents the most searched words by me on Google. From this graph, it is evident that I use Python programming language, Latex for writing reports, use a computer with Ubuntu as the operating system, research on security/privacy, and so on. This itself along with the previous world cloud would be enough to profile me.</p>
<p style="max-width:100%">So, at this point, the ISPs know what makes you highly likely to click an advertisement link!</p>
<h2 style="font-weight:bold;font-size:1.125em;max-width:100%">Pseudo social sphere</h2>
<p style="max-width:100%">Unlike the metadata related to emails and phone call logs, the browsing history can be treated as one-dimensional metadata. Because, it is just the metadata about what you have browsed and it does not contain the influence of other people’s interaction with you. On the other hand, email and phone call metadata contains the interaction you have done with others, along with the interactions done by others with you.</p>
<p style="max-width:100%">However, it is possible to seek insight on your affinity towards the people within your social circle using the one-dimensional browsing history metadata. For example, you will visit your close friends social media profile more frequently than you visit your ex-colleague’s profile whom you know from first job. You might have visited the profile of your friend from the university more recently and frequently, than you visit your friend from high school. By capturing the number of visit counts and frecency (frequency + recency) from your browsing history, it is possible to reconstruct a pseudo social sphere (figure 13) , and thereby converting the browsing history to a two-dimensional data source.</p>
<p style="max-width:100%"><img src="http://www.privacypies.org/assets/images/soc-circle.png" alt="Figure 13: Representation of pseudo-social sphere derived from social media related URLs" style="max-width:100%;margin:0.5em auto;display:block;height:auto"></p>
<p align="center" style="max-width:100%">
<em style="max-width:100%">Figure 13: Representation of pseudo-social sphere derived from social media related URLs</em>
</p>
<p style="max-width:100%">We all have different social circles - family members, childhood/high school friends, friends from work place, ex-colleagues , etc. Our affinity towards them is not necessarily unique. Even though they are not directly connected with each other, it is highly likely that our affinity towards them is similar. By capturing the social media URLs (Facebook and Twitter), Figure 13 represents one such social sphere. In circle 1, I saw a family member and my best friend; in circle 2, one of my colleagues, highschool friend and a family member were seen. This means I weigh them differently, but they can be grouped based on my affinity towards them.</p>
<p style="max-width:100%">Just by knowing the browsing history, now the ISPs can tell who are my close friends, how much do they matter to me and who all have equal importance in my life.</p>
<h2 style="font-weight:bold;font-size:1.125em;max-width:100%">To summarize:</h2>
<p style="max-width:100%">I built a small/ naive tool to replicate the similar graphs shown in this article for almost anyone who is a Linux+Firefox user, browses Internet including social media like anyone else and most importantly stores the browsing history for a decent period of time. While making this tool as generic and simple as possible, I had to omit digging more information that could have been gathered from my own browsing history and exclude use of APIs (as they require individual users to obtain the API tokens). However, to know more about what browsing history could reveal about your personalities, refer to the case study by Share Lab. This provides lot more insights on what one can dig from your browsing history.</p>
<p style="max-width:100%">Whether or not the culprit ISPs as depicted in this article evade your privacy by doing all these analytics, it is indeed important to realize the power of metadata and your contribution to big data processing in the wild. Since privacy of the metadata can not be protected by merely encrypting it, we need stronger policies to defend our digital rights.</p>
<p style="max-width:100%">The tool which I call as Haukana metadata can be downloaded from <a href="https://github.com/sidtechnical/hakuna-metadata-1" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank"><strong style="max-width:100%">here</strong></a>. Once you download it, follow these instruction:</p>
<ul style="max-width:100%">
<li style="max-width:100%">Unzip the folder a right click on a blank area Click on “open in terminal”.</li>
<li style="max-width:100%">In the terminal, type <strong style="max-width:100%">sh requirements</strong> and press <strong style="max-width:100%">Enter</strong>.</li>
<li style="max-width:100%">This will download all the necessary modules needed to run the tool.</li>
<li style="max-width:100%">Once it is completed, type <strong style="max-width:100%">python tool.py</strong> and press <strong style="max-width:100%">Enter</strong>.</li>
<li style="max-width:100%">It will take some time to process your browsing history. So, be patient until it opens a new browser tab as a result. Everything will be processed within your computer and hence, the tool does not send the data anywhere.</li>
<li style="max-width:100%">The newly opened tab will contain some instructions and links to the visualizations derived from your browsing history.</li>
<li style="max-width:100%">Please note that these graphs are interactive as shown <a href="https://github.com/sidtechnical/sidtechnical.github.io/blob/master/assets/images/bh_heatmap.gif?raw=true" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">here</a>, <a href="https://github.com/sidtechnical/sidtechnical.github.io/blob/master/assets/images/bh_anamoly.gif?raw=true" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">here</a>, <a href="https://github.com/sidtechnical/sidtechnical.github.io/blob/master/assets/images/bh_search.gif?raw=true" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">here</a> or <a href="https://github.com/sidtechnical/sidtechnical.github.io/blob/master/assets/images/bh_soccirc.gif?raw=true" style="color:rgb(65,110,210);max-width:100%;text-decoration:underline" target="_blank">here</a>.</li>
<li style="max-width:100%">It is important to note that some of the functionalities may not work as it is shown in this article, mainly because there are no reference data about browsing history. So, I had to build it based on my own browsing history.</li>
<li style="max-width:100%">It goes without saying that the code is open source, and any contribution to the code to improvise and add more functionalities are more than welcome. Even otherwise, in case of issues, do not hesitate to contact me, either by sending an email with subject line “Hakuna Metadata” to <a href="mailto:sidtechnical@gmail.com" target="_blank">sidtechnical@gmail.com</a> or by raising an issue on Github.</li>
</ul>
</div></div></div><div><br><br><div>Regards,</div><div>Nanjira. </div><div><br></div>Sent on the move. </div></div><br></div></div>______________________________<wbr>_________________<br>
kictanet mailing list<br>
<a href="mailto:kictanet@lists.kictanet.or.ke" target="_blank">kictanet@lists.kictanet.or.ke</a><br>
<a href="https://lists.kictanet.or.ke/mailman/listinfo/kictanet" rel="noreferrer" target="_blank">https://lists.kictanet.or.ke/m<wbr>ailman/listinfo/kictanet</a><br>
Twitter: <a href="http://twitter.com/kictanet" rel="noreferrer" target="_blank">http://twitter.com/kictanet</a><br>
Facebook: <a href="https://www.facebook.com/KICTANet/" rel="noreferrer" target="_blank">https://www.facebook.com/KICTA<wbr>Net/</a><br>
<br>
Unsubscribe or change your options at <a href="https://lists.kictanet.or.ke/mailman/options/kictanet/chemukoechk%40gmail.com" rel="noreferrer" target="_blank">https://lists.kictanet.or.ke/m<wbr>ailman/options/kictanet/chemuk<wbr>oechk%40gmail.com</a><br>
<br>
The Kenya ICT Action Network (KICTANet) is a multi-stakeholder platform for people and institutions interested and involved in ICT policy and regulation. The network aims to act as a catalyst for reform in the ICT sector in support of the national aim of ICT enabled growth and development.<br>
<br>
KICTANetiquette : Adhere to the same standards of acceptable behaviors online that you follow in real life: respect people's times and bandwidth, share knowledge, don't flame or abuse or personalize, respect privacy, do not spam, do not market your wares or qualifications.<br></blockquote></div>
<br>______________________________<wbr>_________________<br>
kictanet mailing list<br>
<a href="mailto:kictanet@lists.kictanet.or.ke">kictanet@lists.kictanet.or.ke</a><br>
<a href="https://lists.kictanet.or.ke/mailman/listinfo/kictanet" rel="noreferrer" target="_blank">https://lists.kictanet.or.ke/<wbr>mailman/listinfo/kictanet</a><br>
Twitter: <a href="http://twitter.com/kictanet" rel="noreferrer" target="_blank">http://twitter.com/kictanet</a><br>
Facebook: <a href="https://www.facebook.com/KICTANet/" rel="noreferrer" target="_blank">https://www.facebook.com/<wbr>KICTANet/</a><br>
<br>
Unsubscribe or change your options at <a href="https://lists.kictanet.or.ke/mailman/options/kictanet/kelvinkariuki89%40gmail.com" rel="noreferrer" target="_blank">https://lists.kictanet.or.ke/<wbr>mailman/options/kictanet/<wbr>kelvinkariuki89%40gmail.com</a><br>
<br>
The Kenya ICT Action Network (KICTANet) is a multi-stakeholder platform for people and institutions interested and involved in ICT policy and regulation. The network aims to act as a catalyst for reform in the ICT sector in support of the national aim of ICT enabled growth and development.<br>
<br>
KICTANetiquette : Adhere to the same standards of acceptable behaviors online that you follow in real life: respect people's times and bandwidth, share knowledge, don't flame or abuse or personalize, respect privacy, do not spam, do not market your wares or qualifications.<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div>Best Regards,<br><br></div></div></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div><div><div><font size="2">Kelvin Kariuki</font></div></div></div></div><div><div><div><div><font size="2">Twitter Handle: @teacherkaris</font></div></div></div></div><div><div><div><font size="2">Alt email: <a href="mailto:kkariuki@mmu.ac.ke" target="_blank">kkariuki@mmu.ac.ke</a></font></div></div></div><div><div><font size="2">Mobile: +2547 29 385 557</font></div></div></blockquote></div></div>
</div>