"Phrases in English"
Evolutionary Changes

7 December 2010
Problem with occasional endless loop while processing Find concordances resolved.
November 2010
After extensive testing of several alternative providers site moved to a faster, more reliable server in Düsseldorf. (Winner: Alvotech.de) At least one additional server will be maintained indefinitely as a backup. Please give feedback and report any concerns or errors.
18 March 2010
Migration to new server with far more memory, storage capacity and processing power completed.
20-27 May 2008
Site revamped to support all current-generation browsers. All pages and query interfaces have now been tested with Firefox 2.x and Internet Explorer 6.x.  Testing with Safari, Opera and IE 7 will take place in June, to be followed by rebuilding the underlying databases with the BNC XML Edition, which was released in 2007.
26 September 2006
Missing chargram tables restored
(dropped when database was moved to a new server)
13 December 2005
Regular Expressions (RegExes) provide a powerful way to match multiple phrases with a single query.  3 experimental pages with query by RegEx have been added:

Drill-down queries display "expansions" or longer phrases /phrase-frames incorporating your query terms.  The results indicate the number and percentage of occurrences of each phrase represented by specific longer phrases. Your suggestions for this feature are most welcome. 

4 March 2005
Random concordances of n-grams fixed to eliminate false matches from MySQL FULLTEXT query.
15 December 20044
Chargrams now draw on the updated database. The range of values of n has been increased to 1-8, and the entire range can be searched simultaneously. Finally, the minimum frequency cutoff for inclusion has been lowered to 10, which permits study of more unusual letter combinations.
7 December 2004
Random concordances now offer an interactive display option derived from KWiCFinder. Since implementation is still in progress, some interactive features are not available yet.
17 September 2004
Phrase-frames moved to the new, more complete databases as well, which should eliminate discrepancies between frequencies reported for n-grams and phrase-frames. P-frames are now available for n in the range 2-8.
26 August 2004

Databases for n-grams and POS-grams rebuilt to include some tokens omitted from the original database. Many thanks to Michael Stubbs for pointing out the discrepancies between actual n-gram counts and those in the PIE database! During the transition period 26 August - 17 September, frequencies reported for phrase-frames were potentially somewhat lower than the sum of the n-gram variants.

25 May 2004
AAACL presentation "Phrases in English: Present and Future of an Online Database for English Phraseology" added. This presentation gives further details on the implementation and near-term plans for this site. PowerPoint (requires PowerPoint)  |  HTML
1 April 2004
Optimized code implemented for phrase-frame queries (see 5 Jan 04). 7- and 8-frames will be added to database when time permits.
31 March 2004
"Simple Search", "Explore N-Grams" and "Explore POS-Grams" extended to include all 1-8-grams occurring 3 or more times in the BNC.  Select "Tally all Occurrences" in POS-grams to order results by absolute frequency in the BNC and to see frequencies for all POS-grams occurring one or more times.
25-27 March 2004
"Explore POS-Grams" implemented to investigate the frequencies of Part Of Speech patterns by number of Types or Tokens. A dedicated POS database including all POS-grams in the BNC was built. New feature in all search results:  put cursor over POS codes for a brief explanation of what they stand for.
13 March 2004
"Tamecards" matching for hyphens added to Simple Search. Users can optionally specify queries containing - (hyphen) to match variants with a space and nothing respectively.  For example, with these options, data-base can match data base and / or database respectively.
11 March 2004
New dropdown menu gives direct access to all pages from every other page. Click on heading to display, click again to hide. Simple search issues with optional wildwords for which POS tags are specified were solved.
10 March 2004
Data and plot of "word" length distribution by type and token frequency added.
21-23 February 2004
Option to match POS codes in random concordances implemented.  Since this option can slow searches down considerably, it should be checked only when necessary.  "Next" and "Back" buttons in search results now use POST method to reduce clutter in the browser's history record.
17 February 2004
Matching POS codes and multiple word-forms as well as excluding specific word-forms and POS codes added to Simple Search. In addition, "issues" with matching multiple forms under certain circumstances may have been resolved [feedback on any problems encouraged!].  Other refinements to this interface continue.
14 February 2004
(1) Experimental Simple Search interface for n-grams put on line.  Analysis of queries showed that many users assumed the Explore page (now renamed "Advanced Search") was like a standard search engine page and entered entire phrases into the "Word 1" box, with unexpected results. To accommodate this behavior the Simple Search page has a single field for entering combinations of word-forms and wildwords. 
(2) Both Simple and Advanced Search now normalize search terms and remove illegal characters that prevent matching.
3 February 2004
A new database was implemented for studying "chargrams", i.e. sequences of n characters, where n falls in the range 1-3.  Occurrences of these letter sequences can be explored either by position (initial, medial, final) or by frequency in types or tokens.  Click on any chargram in the search results to see examples in words.
29 January 2004
Information on source texts is now available for random concordances.  Click on the three-letter source code at the end of any citation to see detailed information on the source text, e.g. (source: A6A).  Many thanks to David Lee for making his invaluable text index database available to all.
20 January 2004
After testing numerous tweaks of the "random concordances" feature to make it faster and more efficient, I have eliminated matching by POS code for this feature.  This and other optimizations lead to some spurious matches, a small (?) price for the greatly increased speed.  Please let me know via e-mail link at the bottom of the page if you would like the option to match POS codes for this functionality.  See additional details in the FAQ.
10 January 2004
Click on n-gram to show up to 50 concordances selected at random from the BNC implemented for both n-gram and phrase-frame variant queries.
5 January 2004
A reorganized database and optimized code have made n-gram queries significantly faster, especially when no filters are specified. Phrase-frames will be migrated to the new database as time permits. Numerous cosmetic changes have been made and broken links have been mended throughout the site.  Many thanks to all who took the time to report problems and make suggestions.
4 December 2003
Site relaunched on a new server at http://pie.usna.edu with a much larger database (cutoff for inclusion is now 3 or more occurrences, vs. 5 or more in the previous version of the database).  The site at http://kwicfinder.com/BNC/ will be retained (but not necessarily updated) for the benefit of those who cannot access the new site (e.g. many users in the São Paulo area).  It will eventually be converted to a proxy server for such users.
30 October 2003
"Getting Started" tutorial and "BNC Parts of Speech Tags" pages completed.
"Explore N-Grams" and "Explore Phrase-Frames": word-form and POS filter fields now show color codes when text is entered ; various cosmetic changes to query and results pages.
23 October 2003
Problem with query for forms containing apostrophe (e.g. 's, n't) and underscore (e.g. of_course, a_la) resolved.
"Save" function on "Explore N-Grams" and "Explore Phrase-Frames" results pages now works properly (available for Internet Explorer only; may not function correctly if your security settings do not permit it; see troubleshooting suggestions). Navigation and utility buttons downsized.
21 October 2003
Navigation bar added to all pages.
13 October 2003
"Getting Started" tutorial grows
"Explore N-Grams" and "Explore Phrase-Frames" now Netscape 7.0 compatible (and possibly compatible with 6)
Remaining Netscape issues: Results page "Previous / Next" buttons remain grayed out even if results are available; clickable areas lack hand cursor.
9 October 2003
Rudimentary tutorial premières
Some FAQ items fleshed out