Posted by: felipe | April 22, 2013

Summer of Code project – Autosuggest search engines

This entry is part of a series of posts about some of our proposed GSoC projects. See here the introduction blog post.

Autosuggest search engines

What is it: Firefox supports adding new search engines to the search bar through the OpenSearch standard. You can see an example of it working by visiting bugzilla.mozilla.org and clicking the dropdown icon in the search bar: you’ll notice in the menu there’s an entry for “Add Bugzilla@Mozilla”. However, there are various search websites that does not present themselves as OpenSearch cabable, and thus they don’t get a chance of being added as a search option. But we can be smarter and detect these websites, and then also present them as an “Add <website>” for our search feature. We need, however, to be careful in our detection to not suggest websites with other types of forms, such as information fill-in forms, login forms, etc., and for that we need to come up with smart criteria to when we should consider a website as search-capable.

What does it involve:

  • Initial proposal for an algorithm (a set of criteria) to decide if a form in a website is a search form or not. This form will be obtained at the moment the user submits a query through it
  • Hypothesis testing of said algorithm against a number of example websites found by the student. The sample should contain websites that we want to detect as search capable, as well as websites where we want to detect it as not search capable
  • Implementation of said algorithm using JavaScript
  • Retrieval of necessary data from Firefox’s form history database (SQLite), as well as storage of extra data needed by the algorithm that is currently not collected by form history (e.g. the website URL)
  • Support for running the algorithm at an appropriate time and marking the site as search-capable for future visits
  • Presenting the site in the “Add <website>” menu in future visits if it has been marked as search-capable
  • Respect privacy choices such as Private Browsing and Clear Search Data
  • Should be done as patches to the Firefox codebase, and ideally in the project timeline the patches should end up in a close-to-review+ state with tests included

Non-goals: This project does not include any modifications of the current search UI, it’s only meant to provide the backend service to detect search websites without OpenSearch support. The algorithm should not involve any crowdsourcing or interaction with the user. No popups or prompting to the user will exist upon detection. This is a stepping stone that will provide better functionality and it can be surfaced in more relevant ways in the future when we rework our search functionality (or right now by using add-ons that improve it).

Where to start: You should start by learning how to download Firefox’s source code and to successfully compile it in your platform, and also learning how to generate a patch using Mercurial. After that you should get used to the important source files related to this project: formSubmitListener.js catches the action of submitting a form, which is the hook for the data collection of the algorithm. nsFormHistory.js contains the code that stores and retrieves form history data to the database. You should also look into the file formhistory.sqlite that exists inside your Firefox profile to see what data is stored and see what you can use and what is missing (you can use the SQLite Manager add-on to open that file).

Skills needed: Great JS and SQL skills, and a good understanding of how HTML forms works on the web and how they are used out there in real websites.

What is expected in your project proposal: A great understanding of the project, its goals and non-goals, and a good idea on how to approach each of the features involved and how to time slice them. The algorithm that you propose (it will be iterated during the course of the project), and a rationale behind it. Supporting information that you understand all the data that needs to be stored (e.g. which new fields or tables will be necessary in formhistory.sqlite), how to store it and analyze it. Example websites (at least 4) and how your criteria will behave on them. Links to open-source code (e.g. a github profile) that you have produced, specially ones involving JavaScript and SQL.

Important note: This project has been very well received so far and lots of students having been asking questions about it and are going to submit a proposal for it. Please remember that due to GSoC guidelines and limitations, there’s only one student who can be picked up for the project, so please understand that a lot of proposals are coming in and don’t be discouraged if you’re not the selected for the project. If you’re really interested in working at the Summer of Code, we encourage you to also submit another proposal to some other project of your choice (it can either be another Mozilla one or to other equally awesome open-source organization). Multiple submissions are fine as long as your other proposals receive similar dedication and are not just something quickly put together to increase your chances to be picked up (which wouldn’t help in the first place among other quality proposals).


Responses

  1. […] Autosuggest search engines […]

  2. […] I will be trying to make searches from Firefox a bit more easier by automatically detecting whether you are doing a search or not when you submit a form and then present the user to add the search engine to the browser. A beautifully detailed description of the project has been written by Felipe in his blog. […]


Categories