While I don't normally write PHP code, I have had a bit of experience and so this very thorough rant made me chuckle.
PHP: a fractal of bad design I wish there were pages like this one on every programming language - I'm sure each has it's own set of dust bunnies we'd rather forget.
After getting a link to a "community index" for PHP, I thought I'd check with indeed.com to see what they say about a few languages. It looks like PHP isn't dead yet.
Showing posts with label web technology. Show all posts
Showing posts with label web technology. Show all posts
April 18, 2012
December 13, 2011
Mobile and Web job trends
September 05, 2011
MongoDB replica sets - high level overview
Here is a very very brief overview of MongoDB replica sets and a tip to enable read access to read-only replica slaves.
http://www.codypowell.com/taods/2011/08/a-cloud-hosting-architecture-for-mongodb.html
The full definition from the MongoDB site is here http://www.mongodb.org/display/DOCS/Replica+Sets
March 29, 2011
Browser geolocation APIs
Many mobile web browsers provide access to the current geo location via JavaScript (see the W3C spec). It's very easy to use but there are a couple gotchas to be aware of. First, not all browsers support the API so you will need to take that into consideration when designing your user experience. Next, requesting the geo location from the browser will prompt the viewer to approve the request. On every page view. This is very annoying. You should store the location data away in a cookie and only periodically request updated location information. Another cool function is that the geolocation API allows your code to be notified as the location moves - perhaps your visitors take the bus or use their mobile devices while riding a bike. This is done with callbacks which is very compatible with client development and makes total sense.
Here is some sample script showing how you could use this geolocation API in your mobile or location aware web apps.
Here is some sample script showing how you could use this geolocation API in your mobile or location aware web apps.
function onLocationUpdated(position)
{
// do something useful
savePosition(position);
createCookie("s_geo","on",3600);
updateLocationDisplay(position);
}
// request location
if (navigator.geolocation && !readCookie("s_geo"))
{
navigator.geolocation.getCurrentPosition(onLocationUpdated);
var watchID = navigator.geolocation.watchPosition(
onLocationUpdated, null, {
enableHighAccuracy : true,
timeout : 30000
});
}
geonote.org - sharing the world around you
Over the past month I've put together a mobile friendly web app which lets people share notes about the places they visit. Building the basic web app for storing and sharing notes about a place was pretty straightforward, but like any new application meant to be social the biggest problem is the empty room syndrome - if there is nothing to see, most people just wander off. It takes a special person to start sharing in an empty space.
Rather than try to build up functionality and features to attract a crowd, it seemed that showing information that already exists would be a good way to bootstrap the app. Since I originally envisioned this app as something like Wikipedia for places, but more of an open medium that people can use for any purpose they can put it to, I first thought to look at ways to index Wikipedia entries by their geo location. I quickly found that other folks had already done the indexing and provided an API - geonames.org Pulling this data in was pretty easy, they have a simple HTTP API that returns XML, which geonote.org simply formats into a mobile friendly display. Once there was a web app for sharing notes and viewing 'atlas' pages (the Wikipedia entries), I went in search of other location based APIs and found several great ones.
Here's the list of geo location APIs I've used so far
The Plancast crew especially was extremely helpful. Their forum described upcoming support for searching by latitude and longitude, but it had not been released at the time. After posting a comment they were able to build and release that feature in only a few days (on a weekend too!)
One of the most intriguing APIs was the Hunch API for recommendations. Although it has a lot of power, it requires a Twitter username to provide personalized recommendations and the geonote.org app is too simple to try to do real Twitter authentication integration. I'm sure to revisit the Hunch API though.
Rather than try to build up functionality and features to attract a crowd, it seemed that showing information that already exists would be a good way to bootstrap the app. Since I originally envisioned this app as something like Wikipedia for places, but more of an open medium that people can use for any purpose they can put it to, I first thought to look at ways to index Wikipedia entries by their geo location. I quickly found that other folks had already done the indexing and provided an API - geonames.org Pulling this data in was pretty easy, they have a simple HTTP API that returns XML, which geonote.org simply formats into a mobile friendly display. Once there was a web app for sharing notes and viewing 'atlas' pages (the Wikipedia entries), I went in search of other location based APIs and found several great ones.
Here's the list of geo location APIs I've used so far
- GeoNames.org (Wikipedia entries and more) - http://www.geonames.org/export/ws-overview.html
- Flickr.com (Photos) - http://www.flickr.com/services/api/
- Plancast.com (Events) - http://groups.google.com/group/plancast-api/web/overview?pli=1
- Hunch (Recommendations) - http://hunch.com/developers/v1/
- Twitter (chitter chatter) - http://apiwiki.twitter.com/Twitter-API-Documentation
The Plancast crew especially was extremely helpful. Their forum described upcoming support for searching by latitude and longitude, but it had not been released at the time. After posting a comment they were able to build and release that feature in only a few days (on a weekend too!)
One of the most intriguing APIs was the Hunch API for recommendations. Although it has a lot of power, it requires a Twitter username to provide personalized recommendations and the geonote.org app is too simple to try to do real Twitter authentication integration. I'm sure to revisit the Hunch API though.
March 13, 2011
Mobile webapps and the JQuery Mobile library
Recently I've been experimenting with geo location APIs and mobile friendly web applications. Building a native mobile application felt like it would have too steep a learning curve for the miniscule amount of time I have so I looked at what mobile browsers can deliver with just HTML, CSS and JavaScript. It turns out to be pretty easy to build a good looking mobile web application from scratch and I found the JQuery Mobile framework works well to style pages with a native look and feel.
You can see the results at http://geonote.org/places/plans for a 'from scratch' look and http://m.geonote.org/places/plans for the JQuery Mobile look.
The first thing to take to heart is the spartan look of mobile web apps. There simply isn't room for multiple crowded top nav and side nav bars or for the data dense (but information poor) layouts of most sites. Take a look at a sample page from AllRecipes (which is a great site) - http://allrecipes.com/Cook/SHORECOOK/Photo.aspx?photoID=602783 - there are nav bars for site section, tabs, breadcrumbs, sub-page navigation and so on. Not to mention a right nav bar with even more links. These are all useful I'm sure, but for a mobile web app you need to start from a blank page and work you way up and consider the information value of each pixel used. (Every pixel is sacred, every pixel is great. If any pixel is wasted, Tufte gets quite irate.) Another way to think of this is to consider each link as an internal advertisement for a page the user doesn't want to visit. There is a name for unwanted links on a page put there for commerical gain and that is 'spam'. Don't let your designs become link spammy.
Next, you will want to have a way to preview your web app on a mobile device. If you have a modern phone then you can use it's browser and point it to your local dev environment, but another way is to use an iframe wrapped in a phone mockup. Here's the one I use http://geonote.org/html/iphone/ There may be better mobile browser emulators but I didn't spend much time looking for something once I had the iframe based "emulator" working.
Building pages for the 'from scratch' look follows the typical web app development path - you can use most any framework you are comfortable with, but be careful with approaches that are 'client heavy'. You'll want the smallest HTML, few images and the least number of resources downloaded for rendering each page.
Many scripting libraries have a way to package only the necessary modules into a single resource - this cuts down on the network time needed to get the page rendered. Personally, I avoid client libraries since they are mostly meant for whiz-bang interactivity and on a mobile device the interaction feels better when it is as direct as possible. Common web app performance advice applies here - caching is your friend, the network is not.
The JQuery Mobile look was the most interesting part of building the UI for this site. I was really looking forward to getting a native look and feel for free. Although the library is currently in Alpha 3 stage it's very usable and I haven't run into any bugs in my limited testing. The JQuery Mobile library changes how you think of browser based pages. Not only does it try to use Ajax for most things it also introduces "compound pages" which results in an ever-growing DOM with 'sub pages' or panels that are shown and hidden during screen navigation. This allows for JQuery to perform the animated transitions between screens that give the hip 'mobile look' which is so captivating.
The downside to using an Ajax approach is the use of local anchors (the part of a URL after the '#' character) for tracking state. While this is certanly a popular and Ajaxy way of doing things it does have it's problems. If you aren't familiar with the details it really mucks up how you work when building pages and causes things to simply not work and breaks the page (requiring the user to manually refresh the page). I still don't have forms working and had to disable the Ajax loading of some pages due to this hash-based URL trickery. You will need to rigorously test all pages and transitions between all pages to ensure that it actually works.
Another downside to using JQuery Mobile is that the user interaction is noticably slower than just a simple HTML and CSS page. It is almost not "interactive", which is not a good thing for client applications. There is a lot of promise though and I haven't even looked at the built-in capabilities of JQuery Mobile for wider screen devices like tablets.
You can see the results at http://geonote.org/places/plans for a 'from scratch' look and http://m.geonote.org/places/plans for the JQuery Mobile look.
The first thing to take to heart is the spartan look of mobile web apps. There simply isn't room for multiple crowded top nav and side nav bars or for the data dense (but information poor) layouts of most sites. Take a look at a sample page from AllRecipes (which is a great site) - http://allrecipes.com/Cook/SHORECOOK/Photo.aspx?photoID=602783 - there are nav bars for site section, tabs, breadcrumbs, sub-page navigation and so on. Not to mention a right nav bar with even more links. These are all useful I'm sure, but for a mobile web app you need to start from a blank page and work you way up and consider the information value of each pixel used. (Every pixel is sacred, every pixel is great. If any pixel is wasted, Tufte gets quite irate.) Another way to think of this is to consider each link as an internal advertisement for a page the user doesn't want to visit. There is a name for unwanted links on a page put there for commerical gain and that is 'spam'. Don't let your designs become link spammy.
Next, you will want to have a way to preview your web app on a mobile device. If you have a modern phone then you can use it's browser and point it to your local dev environment, but another way is to use an iframe wrapped in a phone mockup. Here's the one I use http://geonote.org/html/iphone/ There may be better mobile browser emulators but I didn't spend much time looking for something once I had the iframe based "emulator" working.
Building pages for the 'from scratch' look follows the typical web app development path - you can use most any framework you are comfortable with, but be careful with approaches that are 'client heavy'. You'll want the smallest HTML, few images and the least number of resources downloaded for rendering each page.
Many scripting libraries have a way to package only the necessary modules into a single resource - this cuts down on the network time needed to get the page rendered. Personally, I avoid client libraries since they are mostly meant for whiz-bang interactivity and on a mobile device the interaction feels better when it is as direct as possible. Common web app performance advice applies here - caching is your friend, the network is not.
The JQuery Mobile look was the most interesting part of building the UI for this site. I was really looking forward to getting a native look and feel for free. Although the library is currently in Alpha 3 stage it's very usable and I haven't run into any bugs in my limited testing. The JQuery Mobile library changes how you think of browser based pages. Not only does it try to use Ajax for most things it also introduces "compound pages" which results in an ever-growing DOM with 'sub pages' or panels that are shown and hidden during screen navigation. This allows for JQuery to perform the animated transitions between screens that give the hip 'mobile look' which is so captivating.
The downside to using an Ajax approach is the use of local anchors (the part of a URL after the '#' character) for tracking state. While this is certanly a popular and Ajaxy way of doing things it does have it's problems. If you aren't familiar with the details it really mucks up how you work when building pages and causes things to simply not work and breaks the page (requiring the user to manually refresh the page). I still don't have forms working and had to disable the Ajax loading of some pages due to this hash-based URL trickery. You will need to rigorously test all pages and transitions between all pages to ensure that it actually works.
Another downside to using JQuery Mobile is that the user interaction is noticably slower than just a simple HTML and CSS page. It is almost not "interactive", which is not a good thing for client applications. There is a lot of promise though and I haven't even looked at the built-in capabilities of JQuery Mobile for wider screen devices like tablets.
August 15, 2010
Non-blocking operations and deferred execution with node.js
If you write high volume server applications with high concurrency or low latency requirements you have probably heard about node.js This is a relatively easy to understand system that came out in 2009 and has some pretty amazing characteristics. An early presentation by the main author is here - http://s3.amazonaws.com/four.livejournal/20091117/jsconf.pdf
Node.js is an environment for writing Javascript based server applications with a big twist - all IO operations are non-blocking. This non-blocking aspect introduces a concurrency model that may be new to most developers but enables node.js applications to scale to a huge number of concurrent operations - it scales like crazy.
Using non-blocking operations means code that would normally wait for data from a disk file or from a network connection does not wait and waste CPU cycles - your code returns control to the runtime environment and will be called later when the data actually is available. This allows the runtime environment to execute some other code whose data is ready at the moment and gains efficiency by avoiding context switches. This also means there is a single thread accessing data and no synchronization or semaphores are needed to prevent corruption of data due to concurrent access, making your application even more efficient.
Although writing applications in Javascript makes node.js very approachable, the use of non-blocking operations isn't very common in most server applications and results in code that looks similar but is oddly different from what is familiar to most developers. For example, consider a simple program that reads data from a file and processes that data. In a typical procedural program the steps would be :
This pseudo-code example is easy to understand and probably familiar to most developers. The step-by-step sequence of operations is the way most languages work and how most application logic is described. However, in a non-blocking version the open() function returns immediately - even though the file is not yet open. This introduces some challenges.
If the open() function were a blocking operation, the runtime environment would defer execution of the remaining sequence of operations until the data was available and then pick up where it left off. In node.js the way that code after a non-blocking operation is paused and picked up later is through the use of callback functions. All the steps listed after using the open() function are bundled into a new function and that bundle of steps is passed as a parameter to the open() function itself. The open() function will return immediately and your code has the choice of doing some work unrelated to the data that is not yet available or simply returning control to the runtime environment by exiting the current function.
When the data for the opened file actually does become available your callback function is invoked by the runtime and your bundle of steps will then proceed.
The parameters to the callback function are defined by the non-blocking operation. In node.js opening files uses a callback that provides an error object (in case opening the file fails) and a file descriptor that can be used to actually read data. In node.js most callback functions have an error object and a list of parameters with the desired data.
In the non-blocking example above you may have noticed the read(f,buffer) function call and guessed that this might be a non-blocking operation. This requires an additional callback function holding the remaining sequence of operations to execute once the data is read into a buffer.
Some people feel this is a natural way to structure your code. Those people would be wrong.
Here is an actual node.js example of reading from a file
Although this may appear a bit complex for such a simple task, and you can imagine what happens with more complex application logic, the benefit of this approach becomes more apparent when thinking about more interesting situations. For example, consider reading from two files and merging the contents. Normally a program would read one file, then read another file, then merge the results. The total time taken would be the sum of the time to read each file. With non-blocking operations, reading both files can be started at the same time and the total time taken would only be the longest time to read either of the two files.
Node.js is an environment for writing Javascript based server applications with a big twist - all IO operations are non-blocking. This non-blocking aspect introduces a concurrency model that may be new to most developers but enables node.js applications to scale to a huge number of concurrent operations - it scales like crazy.
Using non-blocking operations means code that would normally wait for data from a disk file or from a network connection does not wait and waste CPU cycles - your code returns control to the runtime environment and will be called later when the data actually is available. This allows the runtime environment to execute some other code whose data is ready at the moment and gains efficiency by avoiding context switches. This also means there is a single thread accessing data and no synchronization or semaphores are needed to prevent corruption of data due to concurrent access, making your application even more efficient.
Although writing applications in Javascript makes node.js very approachable, the use of non-blocking operations isn't very common in most server applications and results in code that looks similar but is oddly different from what is familiar to most developers. For example, consider a simple program that reads data from a file and processes that data. In a typical procedural program the steps would be :
file = open("filname"); read(file,buffer); close(file); do_something(buffer);
This pseudo-code example is easy to understand and probably familiar to most developers. The step-by-step sequence of operations is the way most languages work and how most application logic is described. However, in a non-blocking version the open() function returns immediately - even though the file is not yet open. This introduces some challenges.
file = open("filename"); // the 'file' is not yet open! what to do? read(file,buffer); close(file); do_something(buffer);
If the open() function were a blocking operation, the runtime environment would defer execution of the remaining sequence of operations until the data was available and then pick up where it left off. In node.js the way that code after a non-blocking operation is paused and picked up later is through the use of callback functions. All the steps listed after using the open() function are bundled into a new function and that bundle of steps is passed as a parameter to the open() function itself. The open() function will return immediately and your code has the choice of doing some work unrelated to the data that is not yet available or simply returning control to the runtime environment by exiting the current function.
When the data for the opened file actually does become available your callback function is invoked by the runtime and your bundle of steps will then proceed.
open("filename",function (f) { read(f,buffer); close(f); do_something(buffer); });
The parameters to the callback function are defined by the non-blocking operation. In node.js opening files uses a callback that provides an error object (in case opening the file fails) and a file descriptor that can be used to actually read data. In node.js most callback functions have an error object and a list of parameters with the desired data.
In the non-blocking example above you may have noticed the read(f,buffer) function call and guessed that this might be a non-blocking operation. This requires an additional callback function holding the remaining sequence of operations to execute once the data is read into a buffer.
open("filename",function (f) { read(f,buffer, function(err,count) { close(f); do_something(buffer); }); });
Some people feel this is a natural way to structure your code. Those people would be wrong.
Here is an actual node.js example of reading from a file
var fs=require('fs'), sys=require('sys'); fs.open("sample.txt",'r',0666,function(err,fd) { fs.read(fd,10000,null,'utf8',function(err,str,count) { fs.close(fd); sys.puts(str); }); });
Although this may appear a bit complex for such a simple task, and you can imagine what happens with more complex application logic, the benefit of this approach becomes more apparent when thinking about more interesting situations. For example, consider reading from two files and merging the contents. Normally a program would read one file, then read another file, then merge the results. The total time taken would be the sum of the time to read each file. With non-blocking operations, reading both files can be started at the same time and the total time taken would only be the longest time to read either of the two files.
December 17, 2009
Algorithmic (almost) content creation
This article from Wired on Demand Media and their demand-based creation and delivery of 'content' is an important movement on the Web (and off the Web too).
The choice quote is :
The costs to be cut are the costs of creation (manufacturing). The delivery costs are already nearly zero. Currently Demand Media is generating answers to unfulfilled questions using 'crowd sourcing' and blending media assets like video and photos and quickly written text. I wonder if someday even the text could be auto-generated.
I'm sure in the next six months we'll see a blooming of clones - 'DemandMedia for FooBar' style.
Quite a while ago I had thought about what it would take to build a content site with heavy automation on the gathering, review and approval of content. But I had not thought of optimizing that process based on audience demand. Quite clever really.
update
Just found this post on ReadWriteWeb from a writer that previously worked with DemandMedia - required reading to see things from the viewpoint of someone actually creating DemandMedia content.
Choice quote:
The choice quote is :
Instead of trying to raise the market value of online content to match the cost of producing it — perhaps an impossible proposition — the secret is to cut costs until they match the market value.
The costs to be cut are the costs of creation (manufacturing). The delivery costs are already nearly zero. Currently Demand Media is generating answers to unfulfilled questions using 'crowd sourcing' and blending media assets like video and photos and quickly written text. I wonder if someday even the text could be auto-generated.
I'm sure in the next six months we'll see a blooming of clones - 'DemandMedia for FooBar' style.
Quite a while ago I had thought about what it would take to build a content site with heavy automation on the gathering, review and approval of content. But I had not thought of optimizing that process based on audience demand. Quite clever really.
update
Just found this post on ReadWriteWeb from a writer that previously worked with DemandMedia - required reading to see things from the viewpoint of someone actually creating DemandMedia content.
Choice quote:
They [writers] appear to be overwhelmingly women, often with children, often English majors or journalism students, looking for a way to do what they love and make a little money at it.
Compare those demographics to Wikipedia: more than 80% male, more than 65% single, more than 85% without children, around 70% under the age of 30.
November 16, 2009
Making the Web faster - SPDY
Those crafty people at Google are doing some cool work to "make the Web faster". The first I had heard of this initiative it turned out to be how to make "pages" faster - a decent thing, but fairly well known. But recently some folks over there have started to look at the actual underlying issues with the gears grinding out the Web - mainly networking latency. Trying to improve the network protocol of the Web is a tricky thing - lots of people (and egos) can get involved. Surprisingly their effort seems to be off to a good start and everybody is taking it at face value and being supporting and questioning things in a positive way.
One really cool thing mentioned in their whitepaper isn't a direct 'latency' thing - it's about 'server push'. If they can really make this happen a whole knew world of application development would open up.
One really cool thing mentioned in their whitepaper isn't a direct 'latency' thing - it's about 'server push'. If they can really make this happen a whole knew world of application development would open up.
To enable the server to initiate communications with the client and push data to the client whenever possible.
November 06, 2009
IE and heinous "operation aborted" error
We ran into a heinous bug in IE regarding using Javascript to modify the DOM while the page is loading. It turns out that IE6 and IE7 will show a modal error dialog and then clear the page when the user dismisses the error message. On IE8 it was fixed to merely stop rendering the page at that point. How helpful.
You can find out more here on an MDSN blog
If you are unable to defer Javascript execution until after the page finishes loading, the following snippet may work in your use case.
You can find out more here on an MDSN blog
If you are unable to defer Javascript execution until after the page finishes loading, the following snippet may work in your use case.
var tags = document.getElementsByTagName("*");
tags[tags.length-1].parentNode.appendChild(n);
September 29, 2009
Tokyo Tyrant tuning parameters
We've been working with Tokyo Tyrant for some large scale key-value lookups and the performance has been very nice, but has degraded over time. I've been poking around the various options to try to improve the performance and although there is documentation of various options, the pages are hard to read and figure out what's what. So I thought I'd collect them here for reference. I'll describe the results of tuning and tweaking in a future post.
The most recent authoritative references are here:
Tokyo Tyrant (actually Tokyo Cabinet – the storage engine) supports various types of storage – B+ Tree indexing, hash index, etc. This is configured by setting the filename or file extension to a particular value:
Tuning parameters can trail the filename, separated by "#". Each parameter is composed of the name and the value, separated by "=". For example, "casket.tch#bnum=1000000#opts=ld" means that the name of the database file is "casket.tch", and the bucket array size is 1000000, and the options are large and deflate.
For disk-based storage, several tuning parameters specify the on-disk layout while others specify memory and caching settings. Changing the on-disk layout requires scanning and re-writing the database data file which requires exclusive access to the file – which means taking the database offline. This scanning and re-writing process is done via tools provided with the distribution (ex: tchmgr and tcbmgr). Changing the memory and caching settings only requires a restart of Tokyo Tyrant.
We've been working only with on-disk storage via the hash and B+ Tree database engines. For a hash database the tuning parameters for the on-disk layout is limited to the size of the bucket array and the size of an element in the bucket array (choosing 'large' gets you 64-bit addressing and addressable data greater than 2GB). When a hash database file is first created, space is allocated on disk for the full bucket array. For example a database with 100M bucket size and 'large' option would start out at around 800MB. This region of the data file is accessed via memory mapped IO. There is an additional 'extra mapped memory' setting which default to 64MB – I'm not sure what this is used for, but for performance more memory is always better.
For a B+ Tree database, there are additional tuning parameters for the structure of the B+ Tree – how many members (links to child nodes) in an interior non-leaf node and how many members in a leaf node. Records are not stored in the B-Tree leaf nodes, but within 'pages'. The leaf nodes point to these pages and each page holds multiple records and is accessed via an internal hash database (and since this is a B+ Tree the records within a page are of course stored in sorted order). There is also a parameter for the bucket size of this internal hash database. One subtle detail is that the bucket size for a B+Tree database is the number of pages, not the number of elements (records) being stored – so this would likely be a smaller number than a hash database for the same number of records.
I've not yet figured out how the dfunit tuning parameter works or what impact that has on a running server, but it looks interesting.
The most recent authoritative references are here:
Tokyo Tyrant (actually Tokyo Cabinet – the storage engine) supports various types of storage – B+ Tree indexing, hash index, etc. This is configured by setting the filename or file extension to a particular value:
- If the name is "*", the database will be an in-memory hash database.
- If it is "+", the database will be an in-memory tree database.
- If its suffix is ".tch", the database will be a hash database.
- If its suffix is ".tcb", the database will be a B+ tree database.
- If its suffix is ".tcf", the database will be a fixed-length database.
- If its suffix is ".tct", the database will be a table database.
Tuning parameters can trail the filename, separated by "#". Each parameter is composed of the name and the value, separated by "=". For example, "casket.tch#bnum=1000000#opts=ld" means that the name of the database file is "casket.tch", and the bucket array size is 1000000, and the options are large and deflate.
For disk-based storage, several tuning parameters specify the on-disk layout while others specify memory and caching settings. Changing the on-disk layout requires scanning and re-writing the database data file which requires exclusive access to the file – which means taking the database offline. This scanning and re-writing process is done via tools provided with the distribution (ex: tchmgr and tcbmgr). Changing the memory and caching settings only requires a restart of Tokyo Tyrant.
We've been working only with on-disk storage via the hash and B+ Tree database engines. For a hash database the tuning parameters for the on-disk layout is limited to the size of the bucket array and the size of an element in the bucket array (choosing 'large' gets you 64-bit addressing and addressable data greater than 2GB). When a hash database file is first created, space is allocated on disk for the full bucket array. For example a database with 100M bucket size and 'large' option would start out at around 800MB. This region of the data file is accessed via memory mapped IO. There is an additional 'extra mapped memory' setting which default to 64MB – I'm not sure what this is used for, but for performance more memory is always better.
For a B+ Tree database, there are additional tuning parameters for the structure of the B+ Tree – how many members (links to child nodes) in an interior non-leaf node and how many members in a leaf node. Records are not stored in the B-Tree leaf nodes, but within 'pages'. The leaf nodes point to these pages and each page holds multiple records and is accessed via an internal hash database (and since this is a B+ Tree the records within a page are of course stored in sorted order). There is also a parameter for the bucket size of this internal hash database. One subtle detail is that the bucket size for a B+Tree database is the number of pages, not the number of elements (records) being stored – so this would likely be a smaller number than a hash database for the same number of records.
I've not yet figured out how the dfunit tuning parameter works or what impact that has on a running server, but it looks interesting.
- In memory hash
- bnum
- the number of buckets
- capnum
- the capacity number of records
- capsiz
- the capacity size of using memory. Note - records spilled the capacity are removed by the storing order.
- capnum
- the capacity number of records
- capsiz
- the capacity size of using memory. Note - records spilled the capacity are removed by the storing order.
In memory tree
- opts
- "l" of large option (the size of the database can be larger than 2GB by using 64-bit bucket array.), "d" of Deflate option (each record is compressed with Deflate encoding), "b" of BZIP2 option, "t" of TCBS option
- bnum
- number of elements of the bucket array. If it is not more than 0, the default value is specified. The default value is 131071 (128K). Suggested size of the bucket array is about from 0.5 to 4 times of the number of all records to be stored.
- rcnum
- maximum number of records to be cached. If it is not more than 0, the record cache is disabled. It is disabled by default.
- xmsiz
- size of the extra mapped memory. If it is not more than 0, the extra mapped memory is disabled. The default size is 67108864 (64MB).
- apow
- size of record alignment by power of 2. If it is negative, the default value is specified. The default value is 4 standing for 2^4=16.
- fpow
- maximum number of elements of the free block pool by power of 2. If it is negative, the default value is specified. The default value is 10 standing for 2^10=1024.
- dfunit
- unit step number of auto defragmentation. If it is not more than 0, the auto defragmentation is disabled. It is disabled by default.
- mode
- "w" of writer, "r" of reader,"c" of creating,"t" of truncating ,"e" of no locking,"f" of non-blocking lock
Hash
- opts
- "l" of large option,"d" of Deflate option,"b" of BZIP2 option,"t" of TCBS option
- bnum
- number of elements of the bucket array. If it is not more than 0, the default value is specified. The default value is 32749 (32K). Suggested size of the bucket array is about from 1 to 4 times of the number of all pages to be stored.
- nmemb
- number of members in each non-leaf page. If it is not more than 0, the default value is specified. The default value is 256.
- ncnum
- maximum number of non-leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 512.
- lmemb
- number of members in each leaf page. If it is not more than 0, the default value is specified. The default value is 128.
- lcnum
- maximum number of leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 1024.
- apow
- size of record alignment by power of 2. If it is negative, the default value is specified. The default value is 8 standing for 2^8=256.
- fpow
- maximum number of elements of the free block pool by power of 2. If it is negative, the default value is specified. The default value is 10 standing for 2^10=1024.
- xmsiz
- size of the extra mapped memory. If it is not more than 0, the extra mapped memory is disabled. It is disabled by default.
- dfunit
- unit step number of auto defragmentation. If it is not more than 0, the auto defragmentation is disabled. It is disabled by default.
- mode
- "w" of writer, "r" of reader,"c" of creating,"t" of truncating ,"e" of no locking,"f" of non-blocking lock
B-tree
- width
- width of the value of each record. If it is not more than 0, the default value is specified. The default value is 255.
- limsiz
- limit size of the database file. If it is not more than 0, the default value is specified. The default value is 268435456 (256MB).
- mode
- "w" of writer, "r" of reader,"c" of creating,"t" of truncating ,"e" of no locking,"f" of non-blocking lock
Fixed-length
- opts
- "l" of large option,"d" of Deflate option,"b" of BZIP2 option,"t" of TCBS option
- idx
- specifies the column name of an index and its type separated by ":"
- bnum
- number of elements of the bucket array. If it is not more than 0, the default value is specified. The default value is 131071. Suggested size of the bucket array is about from 0.5 to 4 times of the number of all records to be stored.
- rcnum
- maximum number of records to be cached. If it is not more than 0, the record cache is disabled. It is disabled by default.
- lcnum
- maximum number of leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 4096.
- ncnum
- maximum number of non-leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 512.
- xmsiz
- size of the extra mapped memory. If it is not more than 0, the extra mapped memory is disabled. The default size is 67108864.
- apow
- size of record alignment by power of 2. If it is negative, the default value is specified. The default value is 4 standing for 2^4=16.
- fpow
- maximum number of elements of the free block pool by power of 2. If it is negative, the default value is specified. The default value is 10 standing for 2^10=1024.
- dfunit
- unit step number of auto defragmentation. If it is not more than 0, the auto defragmentation is disabled. It is disabled by default.
- mode
- "w" of writer, "r" of reader,"c" of creating,"t" of truncating ,"e" of no locking,"f" of non-blocking lock
Table
September 20, 2009
PubSubHubBub - feed futures
Cool - Bob Wyman is involved in the PubSubHubBub discussion group. In this post he hints at content-based routing - not just topic based routing - being possible in the future with PSHB. It's time to find some excuse to use this new PSHB technology at my day job.
For instance, while today we think mostly about "topic-based" distribution -- i.e. subscribing to known feeds by name, in the future, people might like to subscribe to "concepts" or "words" that appear in the content of updates. Rather than saying "Tell me whenever Tom's feed changes!", you might like to say: "Tell me whenever any feed mentions PSHB." In that case, down stream systems are going to want to have the content (not just a notification of change) in order to match updates to subscriptions.
September 18, 2009
Real-time web, take 2
Bernard Lunn has a good post over on ReadWriteWeb putting the recent PubSubHubBub/RSSCloud news into context. Very funny that he calls KnowNow a "blow out", but I think he correctly identified their issue being a focus on the enterprise market (when that market had fairly established solutions).
Wish I hadn't been so busy over the past two years and could have worked on helping build PubSubHubBub-style technology.
Wish I hadn't been so busy over the past two years and could have worked on helping build PubSubHubBub-style technology.
May 22, 2009
Real-time Web just around the corner
The ReadWriteWeb blog has a good post about gathering momentum for a resurgence of interest in real-time search and notifications. I don't think the examples he points to will push it into the mainstream - that functionality has been around in many forms for many years (I even built searchalert.net seven or eight years ago to do that). I do think something will happen, but I'm not sure what application of this technology will make it to the big time.
The Real Time Web is coming so fast we've hardly had any time to think about it yet. So let's do that, shall we? The two hottest technologies online, Twitter and Facebook, are fast integrating real-time delivery of activity streams to their users. Paul Buchheit, the man who built the first versions of both Gmail and Adsense, says the real time web is going to be the next big thing. Buchheit's FriendFeed is a key point of innovation in real time. Social media ping server Gnip promised to turn everything online into Instant Messaging-style XMPP feeds, and though that's been put on hold in favor of more immediately clear value - we've still got our fingers crossed.
April 23, 2009
Above the Clouds whitepaper
Here's a whitepaper on Cloud Computing from the UC Berkeley RAD Lab. - just what everyone has been waiting for, a whitepaper on Cloud Computing.
In part this describes obstacles and opportunities. My personal favorite :
That's right, we finally have the go-ahead to Invent Scalable Store.
The paper gets better the more you read. Another great quote:
In part this describes obstacles and opportunities. My personal favorite :
Obstacle #6 : Scalable Storage Opportunity #6 : Invent Scalable Store
That's right, we finally have the go-ahead to Invent Scalable Store.
The paper gets better the more you read. Another great quote:
Google Search is effectively the dial tone of the Internet: if people went to Google for search and it wasn’t available, they would think the Internet was down
March 02, 2009
Yahoo Query Language and Open Tables
I've been looking at the Yahoo Open Data Tables and Query Language documentation. This is truly amazing stuff! It provides a service API that accesses many well known data sources (many are Yahoo) and transforms the data into XML or JSON. The data sources can be external URLs that provide XML and Yahoo does the fetch, parse, extract and transform that you want. You can provide a definition of some other external data source and they will hook it into their unified API fetch/query/transform service.
Some of the data sources are Flickr, local listings, geo location info, web search, image search, news search, weather and so on. One stop shopping for lots of great data.
Their console http://developer.yahoo.com/yql/console/ is a great way to see what's possible.
This is what I've wanted for many years. A long time ago I wanted to build a service that would provide "XML data sources" (I even registered xmldatasource.com) for everything available on the Web - now it looks like Yahoo has actually done it.
Let's hope they keep this data access service open to all.
Some of the data sources are Flickr, local listings, geo location info, web search, image search, news search, weather and so on. One stop shopping for lots of great data.
Their console http://developer.yahoo.com/yql/console/ is a great way to see what's possible.
This is what I've wanted for many years. A long time ago I wanted to build a service that would provide "XML data sources" (I even registered xmldatasource.com) for everything available on the Web - now it looks like Yahoo has actually done it.
Let's hope they keep this data access service open to all.
July 30, 2008
WebHooks
This looks interesting - in a 'teach people how the Web really works' kind of way. WebHooks is a catch phrase for Web application development where notification are sent from theh source to the listener via HTTP POST, rather than the other way around via polling (which, as some have said, doesn't scale).
Somewhat related to my earlier post on how HTTP can be used as an alternate to XMPP/Jabber in a publish/subscribe scenario without too much problem or angst.
Somewhat related to my earlier post on how HTTP can be used as an alternate to XMPP/Jabber in a publish/subscribe scenario without too much problem or angst.
July 24, 2008
REST and pub/sub
It's unfortunate that technologists continue to propagate serious mistakes like "[...] its also clear that REST and its inherent polling mechanism isn't the best way of building a user notification system [...]"
REST is about state transfer - and event notifications are also state transfer.
As for HTTP, it isn't only "polling" - anyone that has posted a blog entry knows that. The 'client' can 'post' updates to the 'server' - exactly the same as event notifications via XMPP. The great thing about XMPP is the federated multi-hop capability with 'trust' built-in. Just like email, only with everyone using settings for very low latency delivery.
There have been multiple publish/subscribe over HTTP mechanism (comet, mod_pubsub, KnowNow, etc) over the years.
REST is about state transfer - and event notifications are also state transfer.
As for HTTP, it isn't only "polling" - anyone that has posted a blog entry knows that. The 'client' can 'post' updates to the 'server' - exactly the same as event notifications via XMPP. The great thing about XMPP is the federated multi-hop capability with 'trust' built-in. Just like email, only with everyone using settings for very low latency delivery.
There have been multiple publish/subscribe over HTTP mechanism (comet, mod_pubsub, KnowNow, etc) over the years.
May 12, 2008
Oh, the irony of shallow WSDL
I'm looking into the API for the Hi5 social network and unfortunately found some WSDL. They also have a REST API, but it's documentation appears auto-generated from WSDL that nobody actually filled in. Somewhat useless.
Normally I wouldn't post about WSDL, but I couldn't pass up the irony of the WSDL documentation for the authentication service API having an HTML form describing how to authenticate the user. If that's not irony, I don't know what is.
Normally I wouldn't post about WSDL, but I couldn't pass up the irony of the WSDL documentation for the authentication service API having an HTML form describing how to authenticate the user. If that's not irony, I don't know what is.
May 09, 2008
Ultimate Twitter revenue model - chatbots??
From ReadWrite Web
I think chatbots haven't work for a reason - people want to chat not shop.
Reading RWW and other pundit blogs that describe "how the future will work" reminds of reading Popular Science as a kid and gazing in wonder at the flying cars and transparent house soon to be built.
"Essentially, this would entail Twitter parsing over the Tweets of a given user, as well as the Tweets of the users he/she is following. Common keywords, themes, and phrases are then pulled from this data and associated with that user. As a result, highly-targeted ads can be displayed based on the user's network of content ("web design", for example). These simple text ads would look very similar to regular Tweets, but would be clearly marked as "Sponsored Content"."
I think chatbots haven't work for a reason - people want to chat not shop.
Reading RWW and other pundit blogs that describe "how the future will work" reminds of reading Popular Science as a kid and gazing in wonder at the flying cars and transparent house soon to be built.
Subscribe to:
Posts (Atom)