Jan 29 2009

Extracting HTML from a WebView

Category: Android,Semantic Web,Uncategorized,WebView examplesAleksander Kmetec @ 8:30 pm

Here’s another Android WebView tutorial for those of you who are looking for a way to get the source code of a page loaded in a WebView instance.

This example is a bit more complicated than previous ones, so let me explain it step by step:

  • First, a class called MyJavaScriptInterface is defined. It implements a single public method showHTML() which displays a dialog with the HTML it receives as a parameter.
  • Then, an instance of this class is registered as a JavaScript interface called HTMLOUT. The showHTML() method can now be accessed from JavaScript like this: window.HTMLOUT.showHTML(‘…’)
  • In order to call showHTML() when the page finishes loading, a WebViewClient instance which overrides onPageFinished() is added to the WebView. When the page finises loading, this method will inject a piece of JavaScript code into the page, using the method I described in an earlier post.
  • Finally, a web page is loaded.

final Context myApp = this;

/* An instance of this class will be registered as a JavaScript interface */
class MyJavaScriptInterface 
{
    @SuppressWarnings("unused")
    public void showHTML(String html)
    {
        new AlertDialog.Builder(myApp)
            .setTitle("HTML")
            .setMessage(html)
            .setPositiveButton(android.R.string.ok, null)
        .setCancelable(false)
        .create()
        .show();
    }
}

final WebView browser = (WebView)findViewById(R.id.browser);
/* JavaScript must be enabled if you want it to work, obviously */ 
browser.getSettings().setJavaScriptEnabled(true);

/* Register a new JavaScript interface called HTMLOUT */
browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");

/* WebViewClient must be set BEFORE calling loadUrl! */
browser.setWebViewClient(new WebViewClient() {
    @Override
    public void onPageFinished(WebView view, String url)
    {
        /* This call inject JavaScript into the page which just finished loading. */
        browser.loadUrl("javascript:window.HTMLOUT.showHTML('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
    }
});

/* load a web page */
browser.loadUrl("http://lexandera.com/files/jsexamples/gethtml.html");

WARNING
Unfortunately, this approach suffers from a major security hole: if your JavaScript can call showHTML(), then so can JavaScript from every other page that might get loaded into the WebView. Use with care.

Reblog this post [with Zemanta]

Tags: , , ,

Comments are closed.