Dear academic publishing industry: play nice, and we won’t crush you under our advancing wall of ice.
Google Scholar’s publisher policies insist that people searching journal articles through Google “must be offered at least a complete abstract.” Content which is restricted to subscribers can only be included “as long as you can show a complete abstract (or more) to all users who arrive from Google and Google Scholar.”
So why do all my Google Scholar searches retrieve SpringerLink and IngentaConnect pages which purport to be PDF files — even including text from the paper body in the Google summary — but upon clicking the link turn out to be generic portal pages asking for money? Whether I even get the paper’s abstract or not depends upon the IP address from which I surf.
Look, I don’t expect to get online content for free. (I certainly deserve it, but that’s a different story.) Nevertheless, a little bit of forthright behavior and a willingness to play by the rules already written down would make everybody who uses the Web for academic purposes a whole lot happier.
SEE ALSO:
- John Baez, “Web Spamming by Academic Publishers” (31 July 2007), The n-Category Café.
- Pierre Far, “Academic Publishers as Spammers” (2 August 2007), BlogSci.
Your spirit has inspired me to write a blurb about this at the n-category cafe.
Hi Blake,
Quick note from the IngentaConnect team: you should get an abstract from us in 90% of cases. The only reasons why we wouldn’t display an abstract to you would be (a) there isn’t one [I suspect this would be unlikely for the kind of content you are likely to be looking at] or (b) the publisher chooses to restrict access to abstracts to paid-up subscribers [this may explain why your ability to see the abstract is dependent on IP address; it’s not a policy we encourage but some publishers do still choose to do this]. Next time you come across an example where you are failing to see an abstract on IngentaConnect, I’d be happy to check it out for you if you send me the citation and/or its URL.
Also, whilst IngentaConnect search results in Google Scholar *do* contain snippets from the article’s full text (because our publishers do allow Google to crawl their PDFs), they should *not* purport to be PDF files i.e. should not have the [PDF] notation. For example, see the results of this search. The second result is a link to IngentaConnect but, like all IngentaConnect links in Scholar, does not show the [PDF] notation so as not to mislead the user. This suppression of the [PDF] notation is extended to Google’s main index so you shouldn’t ever see an IngentaConnect citation in Google purporting to be a PDF file. Again, please let me know of examples you come across so that we can investigate.
We really do play by the rules by trying to ensure that the presentation of IngentaConnect-hosted content in Google (and other search engine) indexes is not misleading to users. I’d be happy to help troubleshoot any examples where it seems we are doing otherwise.
Charlie Rapple:
Thanks. I’ll certainly make note of any examples of misleading search results I come across. You might want to look at this BlogSci post, which notes that the Google query site:www.ingentaconnect.com intitle:”journal” returns several hits marked with [PDF] which are not PDF files.
A-ha! Thank-you. We’ll take it forward with Google and try to get these results amended/updated so as not to mislead users ongoing.
Hi Blake, Charlie:
Unfortunately, IngentaConnect IS a big-time cloaker. Charlie addressed some narrow points above about providing full abstracts and cleaning up server-side redirects from .pdf links and so forth, and certainly her participation here is appreciated.
However, none of that addresses the essence of Ingenta’s cloaking. Cloaking is the act of giving Google’s crawler privileged access that a regular Web user cannot have, thus damaging the fidelity of the engine (see the Wikipedia definition), and Charlie admitted that’s what her publisher clients want:
“IngentaConnect search results in Google Scholar *do* contain snippets from the article’s full text (because our publishers do allow Google to crawl their PDFs)”
THAT IS cloaking. It’s black-hat, it’s unconscionable, and IngentaConnect and everyone else who does it–from the porn biz to the “wares” biz to the academic publishing biz–deserves to be called on it.
Thank you for the post, Blake.
-Carl Willis
Oh…here’s my new Web spamming Hall of Shame list:
http://www.fusor.net/board/view.php?bn=fusor_general&key=1187160295
Thanks, Carl
Carl:
Thanks! Have you seen Eureka yet? Your Hall of Shame would make a valuable addition to, say, the Cloaking page.
Blake,
I just noticed Eureka, and I am thrilled that an effective “netroots” response to the organized abuses of Big Publishing is starting to take form at the hands of various conscientious people. Heck…a year ago, I had a hard time just finding out that Web spamming had a name! Obviously today, though, the problem is more pervasive and entrenched and unavoidable to anyone who is a hardcore Googler.
While I am on the fence about the superiority of the for-profit publishing model in general, anyone brazen and ethically-deprived enough to Google-spam me is specifically worthy of no less than complete extirpation from any pursuits involving making money.
Cheers!
-Carl
Hi Carl,
I’ve responded to the email you sent me but for completeness I’m also pasting its content to this thread:
Whilst I understand your concern and your frustration about cloaking in general, I’d just like to make it absolutely clear that Ingenta is *not* engaging in nefarious cloaking. What we are doing is not “unethical” and “reprehensible” [or, in the words you’ve used above, “black-hat” or “unconscionable”]. Google specifically asked us to work with them in the way that we do as previously they had only been able to index our content at abstract level, and they felt being able to index full text was going to make their indices more valuable to their users.
Not only do we have Google’s full permission to work with them in the way that we do, but we also engaged in discussions with other authoritative, respected and – crucially – independent voices in the Web community to gauge their opinion. Here, for example, is what SearchEngineWatch had to say about this issue as it related to us:
“I have to stress, Dodds [Ingenta’s CTO] and the other Google Scholar participants are doing nothing wrong. They are working directly with Google, with Google’s full approval, in a way that Google rightly feels will help searchers.” (as Danny Sullivan points out, the issue is perhaps more that Google does not apply its own rules on cloaking consistently).
Sullivan – and my impression is that this is a widely-held view amongst search experts – describes what we are doing as “good [permitted] cloaking” because its motives are different to those usually guilty of cloaking (whom I agree are “reprehensible, unethical, and very frustrating”): we are not trying to trick Google into giving us a higher site ranking, or to trick users into visiting our site.
With cloaking as it is generally understood, the user’s frustration stems from arriving at a site which turns out not to contain the content for which they were searching. When you arrive at IngentaConnect, the content you seek (and which Google has indexed) *is* there, and I would suggest that your frustration stems more from your inability to access the content on our site, than it does from the way we have made it available through Google. This is a common frustration and has led, as you may know, to the Open Access movement in scholarly research.
Please do not accuse us of not being “responsible” in our use of the Internet. We work within the boundaries set for us by publishers, who are legally entitled to restrict access to their content, and search engines, who bend their own rules when common sense shows that it there is greater benefit to all parties in making an exception. I hope I have demonstrated clearly that we are not in the wrong, and that your frustration would be more accurately projected onto Google (for inconsistently applying its cloaking rules) or publishers (for operating a business model which requires end-users to pay to access valuable research). I accept your frustration, but I think the assertions you make about Ingenta are unreasonable in this case.
Hello Charlie,
Thanks for your reply.
To summarize, you are deferring the blame for this cloaking phenomenon to Google. According to you, Google did not merely give Ingenta “full permission,” but “specifically asked [you]” to give the Googlebot privileged access to your full-text articles. It was Google who felt that delivering cloaked full-text articles would make their index “more valuable to users.” And Google inconsistently enforces its stated cloaking policy, because Google “bend[s] their own rules when common sense shows that it there is greater benefit to all parties in making an exception.”
So let’s first talk about Google. As head of marketing for Ingenta, you are in a unique position to flesh out the nature of Google’s involvement in cloaking your content on their search engine. Evidently, it’s not just tacit approval–Google actually asked you for access, and their marketing department sent your CTO a nice frisbee (http://www.ldodds.com/blog/archives/000165.html). Did any money change hands? I believe you that Google is an active part of this problem, but I have never been able to make the necessary “cui bono” link between this phenomenon and Google. Obviously, Ingenta and other cloaking publishers stand to make a pile of dough off of Google’s free pass on the cloaking rules, as CTO Dodds describes: “With our content appearing in the Google indexes it’s been interesting to watch the referral traffic increase very nicely.” (see link above). But what’s in it for Google? Why do they benefit from bending their own anti-cloaking rules? Throw me a bone here.
Finally, I must challenge the assertion laid forth by both you and Danny Sullivan that “this type of cloaking is helpful to searchers.” You both have totally missed the boat here. Perhaps the Internet’s greatest value as an information source is that when we search it, we have leads “at our fingertips” and can browse through them and see our search strings in context to determine whether or not particular results are useful for our purposes. When someone is Google-spamming, they arecontaminating search results with links that refuse to show a search string in context, but which compete for my attention with honest links because they told the Googlebot that they contained my search terms. Maybe the search terms are really there; maybe not. Maybe the cloaked links ultimately lead to information that would be useful to my purposes; maybe not. That’s beside the point. Cloaking is fundamentally a dishonest attention-grab, a little bait-and-switch. You pull this “sleight-of-hand,” as our buddy Danny calls it, and then you ask me to trust you, to pay you, after all of which you will give me some information that may or may not ultimately be useful. That is remarkably brazen and presumptuous of you (and Google). How about this: would you trust me to tell you if the cloaked information you give me is useful, and pay me back if it isn’t?
Regards,
Carl