Fortunately, I made it past that stage, love it now, and can't imagine ever using Windows (which needed to meditate for two minutes before even turning off) as my primary operating system again. Nonetheless, the first few weeks were rough, and the switch pretty much killed my productivity during that time.
Here are some of the software and operating system challenges I experienced, as well as how they were resolved. Hopefully some of this might be helpful to those who have recently made the switch or are considering it.
It was terrible for doing real work. This was the big one. Sure, there are great applications like Final Cut Pro that are only available on the Mac, and Adobe's suite of products runs just as well on both platforms, but the vast majority of regular people don't use those applications. Nearly everyone needs to use a word processor or spreadsheet semi-regularly, and most business folks need to work with Microsoft Office documents specifically. Unfortunately, using Office 2004 on my Mac was a poor imitation of using any version of Office on any Windows computer. The main problems were speed and stability. Since Office 2004 was written for the PowerPC platform, it runs through the Rosetta translation layer in order to work on Intel Macs. There was really no good option on this front. I tried NeoOffice, OpenOffice, Apple's own suite, and even running Windows Office through VMWare Fusion. All of these solutions were horrible. Solution: Microsoft came out with Office 2008 in January, and life is so much better. It still feels a tad slower than Office 2003 on my older ThinkPad did, but I understand that Office 2007 on Windows is no picnic either. The main benefit is that Office just works now, and my biggest potential reason for switching back to Windows is gone. Thank you Microsoft!
The web is a little broken. Web sites look different on the Mac than on Windows. One reason is that everything on a Mac looks a bit different than on Windows because fonts are rendered differently, with more aggressive default anti-aliasing. I find it makes most type look better (to my eyes at least; I know many folks who find the Mac's type to look "blurry" by comparison). Another reason the web looks different is because most sites are designed to work with Internet Explorer, and modern versions of IE are not available on the Mac. There are huge debates in the Mac community about which browser is best, but Apple's Safari is the market leader, and is the one I prefer for various reasons. However, many web sites don't render properly under Safari, and a small number of sites don't work at all. This is just pathetic when one considers that HTML was designed to specifically work across platforms. In fact, it's one of the primary reasons I convinced myself that switching to a Mac would be okay, since I use mostly web-based applications these days. I was particularly disappointed when I discovered that even many Google products didn't run as well on Safari, or on any Mac browser, as they did on IE or Firefox for Windows. Solution: Things are getting better. Thanks perhaps to efforts like the Acid tests which highlight and embarrass non-compliant browsers, it really seems like browser developers - including the IE8 team - are listening and the experience across browsers is becoming more similar. WebKit becoming a cross-platform standard is helping too.
My mouse was completely broken. This one was bizarre and completely unexpected. For some reason, my mouse didn't feel right on OS X. At first I thought it could be a device problem, so I bought a new mouse. The new mouse didn't feel any better, so I thought perhaps I wasn't yet used to the new mouse's shape, and I should get a different mouse that was shaped more like my old one. That didn't work either. I had no idea what was going on. After a little Googling I learned that OS X uses a different mouse pointer "acceleration curve" than Windows. Windows uses a flatter curve, which makes the mouse respond more naturally, whereas OS X's curve accelerates quicker for speed but slower for smaller, precise movements. The theory is fine, unfortunately the reality just doesn't work at all, with the pointer always feeling too fast or too slow. Solution: There are numerous solutions for this one, including buying a Microsoft Mouse which includes a driver with the Windows acceleration curve. I ended up buying SteerMouse, which lets you modify the curve manually. Some people also don't notice this at all, so for them it's a non-issue.
My phone didn't sync. I have a Windows Mobile phone, and Microsoft doesn't make ActiveSync for OS X. Solution: there are third-party applications which can sync with your Windows Mobile device, such as Missing Sync. I ended up using ActiveSync under VMWare Fusion.
There's no standard uninstall application. Initially I thought OS X's installation system was brilliant. You just drag an application into the Applications folder, and it's installed. If you want to uninstall, you delete it from the same folder. Unfortunately, there are various files outside of that folder which some applications will modify, which of course will not be reverted if you just delete the application package. Most well-behaved applications provide their own uninstall utility to clean up these files, however some don't. Solution: Again, there are third-party applications such as AppZapper which fill this need. I've found that not installing misbehaving applications, which are definitely in the minority, is an even simpler solution.
CTRL-X and CTRL-V don't work for cutting and pasting. For some reason, Apple thinks these keystrokes ought to be COMMAND-X and COMMAND-V. In fact, a lot of what one does with CTRL on Windows is done instead with COMMAND on the Mac. This might make sense if it weren't for the fact that Mac keyboards also have a CTRL key. [Note: As has been pointed out by several folks, the reason that Apple uses COMMAND-X instead of CTRL-X is because Apple invented this shortcut, and Microsoft copied it and used CTRL instead of COMMAND. Of course, now 95% of the world uses CTRL-X, which one must use on web-based applications even using a Mac.] Solution: OS X lets you swap the COMMAND and CTRL keys, which is what I've done. Unfortunately there are a small number of applications for which this doesn't work, and for those you just have to remember to do the reverse.
The HOME and END keys don't work correctly. I actually didn't discover this problem until I hooked up my external keyboard, since the MacBook Pro doesn't even have home and end keys! When I did start to use those keys, I discovered that not only do they not behave the way they do on Windows, but they actually behave differently from application to application on the Mac. In most applications, home and end move to the beginning and end of the page. But in some applications and contexts, they behave like they do on Windows, going to the beginning or end of the line. This is just ridiculous, especially if you use those keys a lot. Solution: I didn't actually find any perfect solutions to this. There are keyboard remapping techniques that you can use but these don't appear to work for all applications (or even all contexts within the same application). I ended up ditching my initial external keyboard for the Apple wireless keyboard, which actually doesn't have home and end keys. As a result, I finally migrated over to the Apple equivalent keystrokes: COMMAND-left and COMMAND-right (or in my case, CTRL-left and CTRL-right).
Importing email is painful. Before Gmail, I stored all of my personal and work mail locally, in multi-gigabyte Outlook PST files. I was stunned to discover that Microsoft doesn't make Outlook for the Mac. To make matters worse, Entourage, their Mac Office equivalent, can't import Outlook PST files. After playing around with Entourage and comparing it to Apple Mail, I decided to use the latter. Unfortunately, Apple Mail didn't provide any simple import solutions either. Solution: I ended up buying a $10 application called O2M which did the trick. Unfortunately, because my files were so large, it took more than a day, with lots of stopping and restarting, to complete the conversion.
It's just as buggy as Windows. No, OS X is not generally unstable. It's a very solid operating system, as most UNIX flavors tend to be. But I'm one of the rare users that didn't have many stability problems with Windows XP. When it would crash, it was typically an application problem, not an operating system issue. Of course, applications crash on OS X also, and some crash quite a bit. Solution: There's not much to say here except to hope that all software applications, on all operating systems, become more stable over time. That's a nice thought.
Aside from the above issues, there are countless additional quirks of the Mac that it takes time to get used to, but I would say there are a lot more of these which pleasantly surprise me than frustrate me. If you have any useful switching tips, especially any better suggestions than what I've listed above, I'd love to hear them.
First of all, he said Fair Isaac has not come up with an estimate of click fraud in the industry - and in fact only analyzed data from a handful (fewer than ten) advertisers. And even this finding pertains to only the syndication networks and not search engines, where the majority of pay-per-click advertising occurs. In fact, they found that the rates of "pathological activity" on search engines was "negligible" ("a few percent or less"). This would imply a combined click fraud rate in the single digits even in their sample set - which they said they would certainly not generalize to the entire industry.
Fair Isaac indicated that they needed a lot more data before they could conduct a meaningful study. They also recognized the need for clean data, acknowledging the importance of using auto-tagging to remove fictitious clicks as we had mentioned to them previously. Unfortunately none of the advertisers in their initial survey were using auto-tagging to fix this problem, which results in inflated click fraud estimates.
We're continuing to talk and I hope we'll be able to help them further understand the challenges relating to click fraud detection, which is completely different from fraud detection in other industries. The biggest difference is the fact that it requires unsupervised analysis, something they told us they are aware of. They won't share their methodologies with us to protect their intellectual property of course, but I get the feeling that they may not be aware of many other factors relating to the specific behavior of the Internet, web browsers, etc., which make this much more than just a generic task for existing fraud tools from other industries. I'm looking forward to talking to them more as their study progresses, and hopefully takes these and other issues into account.
Update: Search Engine Watch has additional details on this at "Fair Isaac Click Fraud Report Spreads False Alarm".
1) Advertisers should never pay for double clicks or repeat clicks from the same session.
I agree that advertisers should not be charged for double clicks. While the activity of comparison shopping is a common reason that multiple clicks to the same ad can occur within a short period of time, if the clicks occur so close together that they could only be caused by double-clicking or malicious repeated clicking, the extra clicks clearly provide no value to the advertiser.
But "same session" is not defined here, and it would be bad for advertisers to define it in a way that would exclude comparison shopping. For example, if publishers and search engines decided not charge for multiple clicks on an ad within the same day, they would redesign their ad systems to not show that advertiser's ad the second time a user searched on the same keyword, since showing ads which produce no revenue is not desirable. But this would deny that advertiser the opportunity to have a user who was comparison shopping revisit their site, and that would rob them of sales opportunities.
2) Advertisers should never pay for traffic from bots.
This request surprised me, since I am not aware of any company in the entire industry which has a policy of charging for clicks made by known bots. We obviously monitor for bot activity and have lists of known bots which we maintain. The difficulty is in knowing whether something is a bot. There are bots which are easily identifiable (for example, if their User-Agent value announces them as a bot) but there are also bots which nobody can identify. We have systems and processes to detect and identify bots (as well as other click fraud attempt methods, such as a click farms), but even in cases where traffic cannot be identified as coming from a specific method, our overall detection approach is still effective because it is based on analyzing data related to the clicks themselves.
3) Advertisers should have control over where, when and to whom ads are distributed.
Definitely. We provide multiple levels of control, ranging from the coarse granularity offered by geotargeting, or opting in or out of syndication or the content network, to more detailed controls such as opting out of specific URLs, which we're the only major search engine to provide at the moment. We are also going to be releasing the ability to prevent ads from showing to specified IP addresses (see #4) in the next month.
4) Domain and IP exclusion lists from search providers should be easy to use and maintain.
I agree. We currently have URL/domain exclusion features and will be launching IP exclusion in the next month. We have and will continue to work hard to ensure features like these are easy to use. At the same time, it is important to provide advertisers with more accurate information about domains and IPs so they can make informed decisions and are not misled into thinking that Google expects them to maintain such lists in order to protect against click fraud. These are features which provide targeting controls to advertisers and are more similar to geotargeting than anything related to invalid click detection.
5) Search providers should provide advertisers detailed referrer information on all traffic that is billed.I agree with this, and we are currently working on ways to provide advertisers with more transparency into where their ads are placed. Advertisers can already obtain referrer URLs from their own web logs, of course.
6) Advertisers should never pay for traffic originating outside the specified geo-targeted settings.I agree with this also, but we need to be clear on what geotargeting is. Geotargeting is based on IP address and other signals and works very well, but is not perfect. There are some instances of IP addresses where geographic location cannot be determined. In addition, when an advertiser targets a specific country, our policy is to show their ads to users who are in that country as well as to users who opt into results from that country. For example, if a user chooses to use a country-specific Google site such as our French site www.google.fr, we will show them ads geotargeted to France even if their computer is located elsewhere. (A side note: Google does not have a US-specific site, and using Google.com from non-US countries will not result in the user opting into US results and ads. Instead, the geotargeting in that case will be based only on their machine location). Another example of user choice taking precedence over machine location is when a user actually types in a query which indicates they are interested in ads relevant to a specific geography, such as "paris france travel".
7) Search engines should adopt third-party validation for click quality as other media companies have done for their audience validation.
We are in favor of submitting our systems to an audit by a trusted third party, and are working with the other members of the IAB Click Measurement Working Group to set this up. The audit will likely be administered through the Media Ratings Council, the organization which audits Nielson and Arbitron. Third-party click fraud auditing firms should also be audited through the MRC to ensure they do not repeat the types of errors that have happened in the past, when fictitious clicks were included in advertiser reports. Those reports misled advertisers and advised them to make decisions which could significantly damage their businesses.
A simple example of continuing serious accounting issues with third parties: several firms have admitted to overcounting errors in the past due to fictitious clicks and have adopted Google's auto-tagging support in their systems to begin to correct the problem for analysis they do for Google advertisers. While they claim to have dealt with the problem of fictitious clicks, some of the same firms continue to publicize estimates of industry click fraud rates which include networks (such as Yahoo and MSN) where it is not yet possible to distinguish between fictitious clicks and real clicks (due to lack of support similar to Google's auto-tagging).
8) Search providers should provide an easy mechanism to reconcile paid clicks on a monthly basis.
Definitely. Google provides this through auto-tagging, which allows advertisers (and third party analytics firms, including click fraud auditing firms) to reconcile the clicks they see in their logs with the number of clicks in their AdWords reports. Using auto-tagging, advertisers (and third-party firms) are able to get accurate information on how many clicks occurred on their campaigns and how those figures compare to the activity seen in their logs. This allows them to properly count clicks and avoid the problem of fictitious clicks we have discussed before.
Google also provides our advertisers with reports of the daily number of invalid clicks on their campaigns, which is what they (and third-party auditing firm) need to verify whether the number of clicks they thought were suspicious was less than or equal to the number of clicks we already filtered out for them that day.
We are the only company in the industry that currently provides either of these features, but we have been working on evangelizing them to our competitors and the industry overall. MSN has announced that they will be releasing their version of invalid clicks reporting later this year, but none of the other major search engines has yet adopted a feature like auto-tagging. We hope both of these will become part of the IAB standards. We have also been working on plans to share detailed click information, similar to a phone bill as many in the industry have pointed out. It would contain information such as the IP addresses, time, and cost associated with individual clicks. It would not contain flags for which specific clicks were detected as invalid (and not charged for), since that would make it simple for a fraudster to pose as an advertiser, run an experiment with millions of clicks, and then attempt to reverse engineer our system. But this type of report would provide advertisers further transparency into which clicks occurred on their ads and more easily identify discrepencies between their systems and ours.
Many thanks to the advertisers who provided their suggestions, as well as to all of the other groups that send us ideas regularly. We benefit greatly from the feedback our advertisers provide us, as it helps us constantly improve our systems and customer service, and we would always like to get more. In fact, we are hosting our first advertiser forum dedicated exclusively to invalid clicks at Google headquarters this coming week. In it, we will be meeting with several dozen advertisers, both large and not-so-large, to discuss their concerns, share information about our invalid click detection methods and policies, and come up with ways to continue to deliver a great advertising experience on Google.
Botnets have of course been around for many years, and have been used most commonly for activities like denial of service attacks. We have also seen them used for click fraud. There are many different ways that click fraud is attempted, and the use of botnets generally represents one of the more sophisticated methods. At a basic level, the main benefit of a botnet to fraudsters is the use of many diverse IP addresses and other machine-specific signals. By utilizing thousands of hijacked IPs, a fraudster hopes that their attack will be difficult to catch. Of course, IP address is only one of hundreds of factors we analyze when looking for evidence of click fraud. Some sophisticated fraudsters realize this, and program their botnets to behave in more complex and subtle ways than just randomizing IPs (as Clickbot.A demonstrates).
One reason we're publishing this paper is to continue to share more information on the types of analysis we do to protect our advertisers against click fraud. But an even more important reason is to provide greater understanding of a challenging area the entire Internet community should work together to manage. The bad guys share their information with each other, and so should we. We hope to be able to discuss more publicly in the future ourselves, and also we hope that other security-related companies will share similar case studies and findings, which will end up benefitting everyone. The concluding observations and recommendations from the paper are worth repeating here:
You can read more about the Clickbot.A case at our AdWords Blog post, and you can access Neil's paper, which he co-wrote with Mike Stoppelman and other team members, here. Incidentally, Neil is also the author of the recently published "Foundations of Security: What Every Programmer Needs to Know", which is a great reference as well as introduction to security methods.
However, click fraud is real, and it's definitely one of the main concerns of advertisers who contact myself and our Click Quality team. But in the click fraud sessions at the Search Engine Strategies conferences and elsewhere, I often hear from advertisers who tell me that they don't know how to correctly diagnose whether they've been affected by click fraud, or how to contact Google to request an investigation.
Well, there's a great post on the AdWords blog about just that from Julian, a long-time member of our Click Quality team. In it, he describes many common cases which can actually be misdiagnosed as click fraud – such as normal traffic or ROI fluctuations (caused by other sources), web log discrepancies due to technical issues, or multiple clicks from the same IP address due to large shared ISP proxies. The Click Quality team helps diagnose these cases every day, but you as an advertiser can diagnose them too. The most important thing to remember is that undetected click fraud shows up as a drop in ROI that you can't explain because of other causes. The more carefully and granularly you track your campaign, the better you can optimize its performance – including managing issues related to click fraud.
In the rare event you find that your campaign may have been affected by undetected click fraud, our Click Quality team definitely wants to hear from you. There's a link at the end of the post to the form you can use to contact them, which I'll repeat for good measure here.
To begin, where do third-party click fraud numbers come from? At Google, whenever we detect malicious activity against an advertiser's account, we mark those clicks as invalid, and thus don't charge the advertiser for them. We utilize a number of different automated techniques and algorithms, as well as proactive manual analysis, to do this, analyzing hundreds of different factors. The analysis that we see from third-party auditing firms (including ClickForensics) seems to essentially rely on just one factor, which we call IP frequency. IP frequency is the number of times an IP address clicks within a certain time window. If it clicks too many times, it could be click fraud. On our end, this is a very simple rule which runs in an automated fashion, protecting Google advertisers 24/7. Third-party firms sometimes find the same suspicious IP frequency patterns that our systems do, and include them in their click fraud reports - leading advertisers to request refunds for clicks they were never charged for in the first place.
But that is actually not even the most common problem with their analyses. What is far more common is that the reports we receive from them ask for refunds for clicks which do not even exist. This more serious problem comes from the issues we addressed in our August report on fictitious clicks. In that report, we demonstrated the limits of web log based analysis for any analytics purpose (including click fraud analysis) due to the way Internet Explorer, Firefox and other browsers work. Unfortunately, that was a very technical report, which was difficult for many readers to parse. I'll try to provide a simpler explanation here.
Here's the problem: web logs, whether generated by an advertisers, or by third-party code on an advertiser's site, cannot directly track ad clicks. Instead, they track visits to a special landing page URL on the advertiser's site (e.g. http://example.com/?adwords ) as a proxy for how many ad clicks occurred. The assumption they're relying upon is that each visit to that URL corresponds to a unique click, and vice versa. But in practice this is not the case. Once a user visits that page, they often browse through the site, navigating through sub pages, and then return to the original landing page by hitting the back button. When the landing page is reloaded in the browser, it appears in the web log as though additional ad "clicks" are occurring. Google can count ad clicks reliably as a click on a Google ad will cause the web browser to contact Google and then we redirect it to the advertiser's landing page. A reload of the advertiser's landing does not contact Google again. In addition, the referrer URL which is passed by the browser when users hit the back button is actually the original referrer URL (which says the page came from an ad click) which gets cached, so there is no analysis which can be done based on logs alone which can resolve this. This is where the fictitious clicks come from.
When one analyzes data from web logs under these default conditions, we find that on average it leads to a 40% inflation of click estimates. You can think of it this way: if an average of 1000 clicks occurred, a log based analysis would estimate on average that there were 1400 clicks, 400 of which are fictitious and did not actually occur.
Now consider the principal analytical tool of third-party click fraud firms: IP frequency. When they see a user browsing through the site, and reloading the landing page multiple times in a short time window, they will classify it as click fraud - even though those "clicks" do not actually exist. It also results in the misclassification of advertisers' best users (the ones who are spending time browsing through their sites) as "fraudulent".
Thus, while click estimates were inflated by 40% on average, click fraud estimates were inflated by much, much higher amounts. As we detailed in our report, we found cases of firms reporting click fraud rates above 100% in some instances due to this problem. We also found that in other instances, clicks classified as "click fraud" by third-party firms produced sales at the same rate as the "good" clicks. In other words, the identification of click fraud by third-party firms was much worse than imprecise - it was not even in the right ballpark, with nearly all of the "bad" clicks they identified actually being fictitious.
The net result was that advertisers were consistently being given false data from reports they trusted, which would actually hurt their advertising campaigns if they acted on them. For example, if an advertiser is told certain keywords have higher "fraud rates", they are likely to change their campaign to eliminate spending on those keywords in favor of others, hurting the performance on their campaigns when this information is false. The damage this can do to advertisers' businesses can be quite large.
So is there a solution to this? Yes. Third-party analytics (not click fraud) firms have been aware of the page reload issue for many years, and generally use redirects (rather than web log based tracking) to avoid it. If one is tied to using web site logs (or landing page code generating logs) however, the only solution is to use the AdWords auto-tagging feature. Auto-tagging has been available since 2005, and is a feature which appends a unique ID to the landing page URL for every click, so that the cases of (a) multiple clicks and (b) multiple reloads of the landing page can be easily distinguished.
Two of the three firms we identified in our report, AdWatcher and ClickFacts, have not made any changes we're aware of. That's discouraging to say the least. ClickForensics claims to have fixed this problem a couple of months ago by requiring their AdWords clients to use auto-tagging, yet despite such a significant change in methodology, their new numbers are nearly the same as their old numbers. Perhaps it hasn't yet been fully or correctly utilized, so the significant corrective drop in their numbers is yet to come. Or perhaps their network is heavily skewed toward non-Google advertisers, and thus they still cannot correct the problem until Yahoo, MSN and others implement their own versions of auto-tagging. Until then, considering that the total number of clicks they're counting could be off by as much as 40%, and their click fraud estimates could be off by much more, there's very little meaning in a difference of 0.1% from Q2 to Q4 - or in any of their other inferred statistics. But most importantly, the fact that they don't take into account the amount that Google already protects advertisers against means that they're not even trying to measure actual click fraud.
There was a press release yesterday from ClickForensics stating that their quarterly measure of click fraud for Q4 was 14.2%. They also stated that this was the year's "highest level" (up from 14.1% in Q2) and that the click fraud rate for search engine content networks was 19.2%. This morning there was a competing press release from Incremental Advantage and several other click fraud firms, stating that "Click Fraud Cost Internet Advertisers $666 Million in 2006".
On a basic level, these numbers are much higher than what we see at Google, and are not at all representative of the actual statistics of our network. Most savvy advertisers and industry pundits are already aware of this (see "Why We Can't Trust Click Fraud Numbers" in yesterday's WebProNews), and generally haven't paid much attention to these estimates for a while.
However, these stats are still out there and there are some things everyone should keep in mind when reviewing them. Specifically:
The key point here is not that their numbers are "too high". The point is that their data collection methods are inherently flawed and any resemblance their numbers could have to reality would be coincidental. Even so, given that they are not measuring click fraud (see point #3), they apparently don't intend their numbers to reflect reality.
Click fraud protection is something we take very seriously at Google, and it requires a high level of scientific rigor to do well. It's frustrating to see basic mistakes being made by firms selling "additional protection" to AdWords advertisers - in essence, charging them money for advice which can actually hurt their businesses. I've spoken with many firms and a number of academics interested in this area, and the ones who are investing in serious R&D efforts recognize the limitations of their data and analysis and have not been focusing on publicizing unsupportable and flawed numbers such as the above. We're very supportive of those efforts (and in scientific research in this area in general) and we'll continue to work closely with them.
For more information about Google's actual metrics, you can see my previous posts here and here.
Update: I've posted a second part to this post, with more technical details on points #1 and #2.