Jan 29 2009

Extracting HTML from a WebView

Jan 29 2009

Here’s another Android WebView tutorial for those of you who are looking for a way to get the source code of a page loaded in a WebView instance.

This example is a bit more complicated than previous ones, so let me explain it step by step:

  • First, a class called MyJavaScriptInterface is defined. It implements a single public method showHTML() which displays a dialog with the HTML it receives as a parameter.
  • Then, an instance of this class is registered as a JavaScript interface called HTMLOUT. The showHTML() method can now be accessed from JavaScript like this: window.HTMLOUT.showHTML(‘…’)
  • In order to call showHTML() when the page finishes loading, a WebViewClient instance which overrides onPageFinished() is added to the WebView. When the page finises loading, this method will inject a piece of JavaScript code into the page, using the method I described in an earlier post.
  • Finally, a web page is loaded.

final Context myApp = this;

/* An instance of this class will be registered as a JavaScript interface */
class MyJavaScriptInterface 
    public void showHTML(String html)
        new AlertDialog.Builder(myApp)
            .setPositiveButton(android.R.string.ok, null)

final WebView browser = (WebView)findViewById(R.id.browser);
/* JavaScript must be enabled if you want it to work, obviously */ 

/* Register a new JavaScript interface called HTMLOUT */
browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");

/* WebViewClient must be set BEFORE calling loadUrl! */
browser.setWebViewClient(new WebViewClient() {
    public void onPageFinished(WebView view, String url)
        /* This call inject JavaScript into the page which just finished loading. */

/* load a web page */

Unfortunately, this approach suffers from a major security hole: if your JavaScript can call showHTML(), then so can JavaScript from every other page that might get loaded into the WebView. Use with care.

